draft-ietf-ssh-handbook-00

Internet Draft                                    Barbara Fraser
Network Working Group                             SEI/CMU
Expires in six months                             July 1995

   Site Security Handbook for System and Network Administrators
              <draft-ietf-ssh-handbook-00.txt>


Status of this Memo

     This document is an Internet-Draft.  Internet-Drafts are working
     documents of the Internet Engineering Task Force (IETF), its
     areas, and its working groups.  Note that other groups may also
     distribute working documents as Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     ``work in progress.''

     To learn the current status of any Internet-Draft, please check
     the ``1id-abstracts.txt'' listing contained in the Internet-
     Drafts Shadow Directories on ftp.is.co.za (Africa),
     nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
     ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).


1.   INTRODUCTION

     This document provides guidance to system and network administrators
     on how to address security issues within the Internet community.
     It builds on the foundation provided in RFC 1244 and is the
     collective work of a number of contributing authors. Those authors
     include: Philip J. Nesser, Klaus-Peter Kossakowski, Erik Gutmann,
     Nevil Brownlee, Jules P. Aronson, Edward.P.Lewis

2.   SECURITY POLICIES

   2.1  What Is a Security Policy and Why Have One?

     The security-related decisions you make, or fail to make, as system
     administrator largely determines how secure or insecure your system
     is, how much functionality your system offers, and how easy your
     system is to use. However, you cannot make good decisions about
     security without first determining what your security goals are.  And
     until you determine what your security goals are, you cannot make
     effective use of any collection of security tools, because you simply
     won't know what to check for, or what restrictions to impose.

     For instance, your goals will probably be very different from the
     goals of a hardware or software vendor who ships your system software
     set up in a default state that allows maximum access (e.g., all
     services turned on), but provides minimal security.

     Your goals will be largely determined by the following key tradeoffs:

         (1) services offered (functionality) vs. security
         (2) ease of use vs. security
         (3) cost of security vs. risk of loss

     Your goals are communicated to all users, operations staff, and managers
     through a set of security rules, called a "computer security policy".

   2.1.1  Definition of a Computer Security Policy

     A computer security policy is a formal specification of the rules by
     which people are given access to an organization's technology and
     information assets.

   2.1.2  Purposes of a Computer Security Policy

     The main purpose of a computer security policy is to inform users,
     staff and managers of their obligatory requirements for protecting
     technology and information assets through a consistent approach to
     policy matters.  Another purpose is to provide a baseline from which
     to acquire, configure and audit computer systems for compliance with
     the policy.  Therefore an attempt to use a set of security tools in
     the absence of at least an implied security policy is meaningless.


   2.2  What Makes a Good Computer Security Policy?

     The characteristics of a good security policy are:

         (1) Able to be implemented (e.g. through system administration
             procedures, and by publishing of acceptable use guidelines, etc.).
         (2) Can be enforced (e.g., via security tools where appropriate,
             but also from a human perspective, via sanctions).
         (3) Clearly defines the areas of responsibilities.

     The components of a good security policy include:

         (1) Computer Technology Purchasing Guidelines - Specifies required
             or preferred security features to supplement existing purchasing
             policy and guidelines.

         (2) Privacy Policy - Defines reasonable expectations of privacy
             regarding such issues as monitoring of electronic mail and
             logging of keystrokes.

         (3) Access Policy - Defines access rights and privileges, and protects
             assets from loss or disclosure by specifying acceptable use
             guidelines for users, operations staff, and management.  Provides
             guidelines for external connections, data communications,
             connecting devices to a network, and adding new software to a
             system.  Also provides notification guidelines (e.g., requires
             a login warning banner stating that the system is for authorized
             users only, instead of displaying a welcome banner.)

         (4) Accountability Policy - Defines the responsibilities of users,
             management, and operations staff.  Specifies an audit capability,
             and provides incident handling guidelines.

         (5) Authentication Policy - Establishes trust through an effective
             password policy, and by setting guidelines for remote location
             authentication, and the use of authentication devices.

         (6) Availability - Defines availability expectations.  Addresses the
             redundancy of security functions.  Specifies recovery procedures
             and mechanisms.

         (7) Violations Reporting - Denotes which types of violation must be
             reported, and provides a central reporting point.  A non-
             threatening atmosphere and the possibility of anonymous reporting
             will result in a greater likelihood that if a violation is
             detected, it will be reported.

         (8) Supporting Information - Provides users, staff, and management with
             contact information for each type of policy violation.  Gives
             guidelines to your public relations office on how handle outside
             queries about a security incident.  Provides cross-references to
             security procedures and related information, such as company
             policies and regulatory requirements (federal, state, and local).

     Once your computer security policy has been established it should be
     clearly communicated to users, staff, and management.  Having all
     personnel sign a statement indicating that they have read, understood,
     and agree to abide by the policy is an important part of the process.
     Finally, your policy should be reviewed on a regular basis to see if it
     is successfully supporting your security needs.

3.   SECURITY PROCEDURES

   3.1  Authentication

   3.1.1 One-Time passwords

     Given today's networked environments, it is recommended that sites
     concerned about the security and integrity of their systems and
     networks consider moving away from standard, reusable passwords. There
     have been many incidents involving Trojan network programs (e.g.,
     telnet and rlogin) and network packet sniffing programs.  These
     programs capture clear-text hostname, account name, password triplets.
     Intruders can use the captured information for subsequent access to
     those hosts and accounts.  This is possible because 1) the password is
     used over and over (hence the term "reusable"), and 2) the password
     passes across the network in clear text.

     Several authentication techniques have been developed that address
     this problem. Among these techniques are challenge-response
     technologies that provide passwords that are only used once (commonly
     called one-time passwords). This document provides a list of sources
     for products that provide this capability. The decision to use a
     product is the responsibility of each organization, and each
     organization should perform its own evaluation and selection.

   3.1.2  Kerberos

     <insert info about this technology here>

   3.1.3  Password Assurance

     While the need to eliminate the use of standard, reusable passwords
     cannot be overstated, it is recognized that some organizations may
     have to transition to the use of this technology. Given that situation,
     we have included the following sections to help with the selection and
     maintenance of traditional passwords. If you have to use them, here
     is some pertinent advice.

   3.1.3.1 The importance of robust passwords

     In almost all cases of system penetration, the intruder needs to
     gain access to an interactive user account on the system.  One way
     that goal is typically accomplished is through guessing the password
     of a legitimate user.  The intruder may attempt to guess a password
     by knowing something about the legitimate user, or (more commonly)
     by trying dictionary words and other simple guesses.  This threat is
     increased by the easy availability of the password file, since an
     intruder may attempt to guess the encrypted passwords in that file by
     using another (perhaps faster and more powerful) system to run automated
     "cracking" tools on the stolen password file.  To guard against this
     threat, the simplest and most straightforward approach is to assure that
     the passwords on all accounts are not easily guessed.  Once the system is
     guarded against this threat, the intruder would have to devote an
     inordinately large amount of resources to try to crack your passwords.
     It is more likely that the intruder would try to find another method of
     penetration, or would give up and try another system.

   3.1.3.2 Restricting access to the password file

     The first step to password assurance is to limit access to the encrypted
     portion of the users' password.  If the encrypted portion is unavailable
     to an attacker, guessing passwords cannot be done off-line at a remote
     (and possibly more powerful) location.

     The typical method used to protect the encrypted password is to use a
     shadow password file scheme.  This requires the use of two distinct
     password files. The first contains the standard password file information
     (username, full-name, login directory, and shell) but contains a dummy or
     false encrypted password field.  This information is kept in another
     password file that is available only to the root user or system.  In this
     case, standard system utilities (such as FTP) and users without elevated
     privileges cannot gain access to this information.

     It is important to note that use of a shadow password file does not
     remove the need for robust password selection on the part of the user,
     but is one additional protection against penetration.

   3.1.3.3 Initial password selection

     One problem often found in systems that have been penetrated is that
     a user's initial password, set by the system administrator, is easily
     guessed by an intruder.  While the setting of this password is usually
     accompanied by the request that the password be changed as soon as the
     account is used, many users lack the knowledge or motivation to change
     it. Also, by giving a user a simple password, the system administrator
     sets up an expectation on the part of the user that simple passwords are
     acceptable. Therefore, a naive user is unlikely to select a more robust
     password, since such passwords tend to be harder to remember.

     Because of this threat, the system administrator should follow some
     sensible guidelines (such as the criteria shown below) for selecting
     "good" initial passwords for the users of the system, and for selecting
     a good root password as well.  The system administrator should also
     provide users with guidance on choosing their own robust passwords.

   3.1.3.4 Password selection criteria

     Many users give little thought to what makes a "good" password, and
     would be surprised at just how easy some passwords are for intruders to
     guess.  Unfortunately, the increasing power of computing is making
     password "cracking" far easier than ever before.  It is now fairly
     trivial to launch a brute-force attack on any password of four characters
     or less.  Also, the increasing availability of disk space is allowing
     intruders access to complete dictionaries of not only English, but many
     foreign languages as well.  Some password selection criteria that can
     reasonably be included in a password policy are:

      o  The minimum password length should be 7-8 characters
  .   o  The password should not be a permutation of any user account or
         personal information such as name, office, telephone number, or
         social security number.
      o  The password should not be contained in any English or foreign
         dictionary in regular use. This includes real/imaginary character
         names or terms from folklore.
      o  The password should not consist of acronyms or project-specific
         information relating to the user's job or function.

     While these criteria may seem overly restrictive, there are many
     schemes that will generate hard to guess passwords that are still easy
     to remember.  One example is to take a well known phrase and use the
     first or second letter of each word to form your password.  Another is
     to use two short words separated by a control or punctuation character.

   3.1.3.5 Password aging

     When and how to expire passwords is still a subject of controversy among
     the security community.  It is generally accepted that a password should
     not be maintained once an account is no longer in use, but it is hotly
     debated whether a user should be forced to change a good password that's
     in active use.  The arguments for changing passwords relate to the
     prevention of the continued use of penetrated accounts.  However, the
     opposition claims that frequent password changes lead to users writing
     down their passwords in visible areas (such as pasting them to a
     terminal), or to users selecting very simple passwords that are easy to
     guess.

     While there is no definitive answer to this dilemma, a password policy
     should directly address the issue and provide guidelines for how often
     a user should change the password.  It is recommended that passwords be
     changed whenever root is penetrated, there is a critical change in
     personnel (especially if it is the system administrator!), or when an
     account has been compromised.  In particular, if the root password is
     compromised, all passwords on the system should be changed.  In addition,
     an annual change in their password is usually not difficult for most
     users, and you should consider requiring it.

   3.1.3.6 Machine generated passwords

     An alternative to allowing users the opportunity to select easily guessed
     passwords is to require machine generated passwords.  These have
     historically been a problem due to the difficulty of remembering such
     passwords.  As a result, they were often written down, and therefore more
     easily stolen.  Another alternative is to present a list of several
     machine generated passwords and allow the user to select the one that's
     easiest for him or her to remember.

     It is recommended that if machine generated passwords are mandated by
     your policy, then users should not be required to change passwords very
     frequently. (For example, not more than once every six months.)

   3.1.3.8 Multiple accounts

     It is now commonplace for a user to have accounts on many different
     systems.  One problem is that if the same password is used on all of
     these systems, the penetration of one system will easily spread to the
     others.  It is therefore recommended that a user should have different
     passwords on each system.  However, if a user plans to use the same
     password on more than one system, these systems should all be in the
     same administrative domain, and should all be running the same operating
     system.

   3.1.3.9 Tools for password assurance

     There are tools to aid a security administrator in assuring robust
     passwords.  These tools fall generally into two categories:
     "password cracking" and "password filtering" programs.

     Password cracking programs take the encrypted password field and attempt
     to perform the same task an intruder would try -- guessing the password
     from dictionaries and common words.  This is a lengthy process for any
     but the simplest of schemes since the guesses have to be encrypted once
     for each compare against the encrypted password entry.  However, the
     intruder may have access to better dictionaries and more powerful systems
     than the security administrator.  Nonetheless, password cracking programs
     provide a mechanism for assuring that no user has a password that is
     contained in your own dictionaries.  Examples of tools that provide
     password cracking are "crack" and "cops".  Crack is a tool that is
     completely devoted to password cracking and provides a mechanism for
     variations on dictionary words, and also provides the ability to
     distribute the processing over a network of computers to take advantage
     of parallelism.  Cops is a more general security tool that has password
     cracking as one of its functions.  It is not as complete as crack, but
     serves as a first level of protection for a system.

     Password filtering programs are replacements for the system utility that
     is invoked to change a user's password.  In this scheme, the user submits
     a plain-text password to the tool, which checks the password against
     some criteria before changing the encrypted password on the system.
     This scheme has the advantage that more tests can be run against the
     plain-text password (such as a check for recurring patterns or reverse
     words) that would be very expensive to check for in a password cracking
     program.  If such a tool is installed, it should completely replace the
     password changing and setting mechanism so that no user can bypass the
     program and change the password to a less robust string.  An example of
     a tool that provides password filtering is "npasswd".

   3.2  Authorization

     Authorization refers to the process of granting privileges to processes and
     ultimately users.  This differs from authentication in that authentication
     is what occurs to identify a user.  Once identified (reliably), the
     privileges, rights, property, and permissible actions of the user are
     determined by authorization.

     <Should "objects" and "entities" be defined here?>
     Explicity listing the authorized activities of each user (and user process)
     with respect to all resources (objects) is impossible in a reasonable
     system.  In a real system certain techniques are used to simplify the
     process of granting and checking authorization(s).

     One approach, popularized in UNIX systems, is to assign to each object
     three classes of user - the super user, the owner, and the group.  Super
     user, or root, is an entity that has access to all portions (and objects)
     of the computer.  The owner of an object is the "user" who either created
     the object or was given it by the super user.  A group is a collection of
     users that share privileges over a collection of objects.  Groups ease
     authorization management by simplifying the process of changing the
     authorization of users and by changing the authority of a group to manage
     an object.

     Another approach is to attach to an object a list which explicitly contains
     the identity of all permitted users (or groups).  This is an Access Control
     List.  The advantage of these are that they are easily maintained (one
     central list per object).

     <NOTE: What other methods (short phrases are all that's needed, I
      probably know what they refer to) should be mentioned... >

     <NOTE: Should the process of deciding what authorizations are to be
      granted be described here (or is that in policies)?  Should mechanisms
      be described here? >

   3.3  Access
   3.4  Modems

   3.4.1  Modem lines must be managed.

     Although they provide convenient access to a site for its users, they
     can also provide an effective detour around the site's firewalls.  For
     this reason it is essential to maintain proper control of modems.

     Don't allow users to install a modem line without proper authorization.
     This includes temporary installations, e.g. plugging a modem into a
     facsimile or telephone line overnight.

     Maintain a register of all your modem lines.  Conduct regular site
     checks for unauthorized modems; keep your register up to date.

   3.4.2   Dial-in users must be authenticated.

     A username and password check should be completed before a user can
     access anything on your network.  Normal password security considerations
     (such as choosing passwords which don't appear in dictionaries and
     changing them from time to time) are particularly important.

     Remember that telephone lines can be tapped, and that it is quite
     easy to intercept messages to cellular 'phones.  Modern high-speed
     modems use more sophisticated modulation techniques -, which makes
     them somewhat more difficult to monitor - but it is prudent to assume
     that hackers know how to eavesdrop on your lines.  For this reason you
     should use one-shot passwords (e.g. skey) or hardware authentication
     devices (e.g. SecureID) if this is at all possible.

     It is helpful to have a single dial-in point, e.g. a single large
     modem pool, so that all users are authenticated in the same way.

     Users will occasionally mis-type a password.  Set a short delay - say
     two seconds - after the first and second failed logins, and force a
     disconnect after the third.  This will slow down automated password
     attacks.  Don't tell the user whether the username, the password or
     both were incorrect.


   3.4.3  All logins (successful and unsuccessful) should be logged.

     Don't keep correct passwords in the log, but consider  keeping incorrect
     passwords to aid in detecting password attacks.  However, bear in mind
     that most incorrect passwords are correct passwords with one character
     mistyped, and may suggest the real password.  If you can't keep this
     information secure, don't log it at all.

     If Calling Line Identification is available, take advantage of it
     by recording the calling number for each login attempt.  Be sensitive
     to the privacy issues raised by Calling Lne Identification.  Also be
     aware that Calling Line Identification is not to be trusted; use the
     data for informational purposes only, not for authentication. <NOTE:
     it was suggested that we add " Caller ID data can be read from a
     compatible modem via a serial line" >


   3.4.5  Minimize the amount of information given in your opening banner.

     In particular, don't announce the type of host hardware or operating
     system - this encourages specialist hackers.

     Display a short banner, but don't offer an 'inviting' name (e.g.
     University of XYZ, Student Records System).  Instead, give your site
     name, a short warning that all sessions are monitored, and a
     username/password prompt.  Get your site's lawyers to check your
     banner to make sure it states your legal position correctly.

     For high-security applications, consider using a 'blind' password,
     i.e. give no response to an incoming call until the user has typed in
     (without any echoing) a password.  This effectively simulates a dead
     modem.


   3.4.6  Call-back Capability

     Some dial-in servers offer call-back facilities, i.e. the user dials
     in and is authenticated, then the system disconnects the call and calls
     back on a specified number.  You will probably have to pay the charges for
     such calls.

     This feature should be used with caution; it can easily be bypassed.
     As a minimum, make sure that the return call is never made from the
     same modem as the incoming one.  Overall, although call-back can
     improve modem security, you should not depend on it alone.


   3.4.7  Dial-out authentication

     Dial-out users should also be authenticated, particularly since your
     site will have to pay their telephone charges.

     Never allow dial-out from an unauthenticated dial-in call, and consider
     whether you will allow it from an authenticated one.  The goal here is
     to prevent callers using your modem pool as part of a chain of logins.
     This can be hard to detect, particularly if a hacker sets up a path
     through several hosts on your site.

     As a minimum, don't allow the same modems and phone lines to be used
     for both dial-in and dial-out.  This can be implemented easily if you run
     separate dial-in and dial-out modem pools.


   3.4.8  Make your modem programming as 'bullet-proof' as possible.

     Be sure modems can't be reprogrammed while they're in service.  As a
     minimum, make sure that three plus signs won't put your dial-in modems
     into command mode!

     Program your modems to reset to your standard configuration at the
     start of each new call.  Failing this, make them reset at the end of
     each call.  This precaution will protect you against accidental
     reprogramming of your modems.

     Check that your modems terminate calls cleanly.  When a user logs out
     from an access server, verify that the server hangs up the 'phone line
     properly.  It is equally important that the server forces logouts from
     whatever sessions were active if the user hangs up unexpectedly.


   3.5  Cryptography

   3.6  Auditing

     This section covers the procedures for collecting data generated by network
     activity that may be useful in analyzing the security of a network and/or
     useful in responding to a security incident.  This section also covers the
     handling, preservation, and utilization of the data.

     (This will be reworked as I develop the remainder.)

   3.6.1  What to collect

     Audit data should include any attempt to achieve a different security level
     of any person, process, or other entity in the network.  The most obvious
     example in this area is a log of attempts to login to a host computer.

     Audit data should also include data pertaining to any "public" or anonymous
     access and retrieval of data, at least to the granularity of the "remote"
     host.

     And on and on...

     <NOTE: I'd appreciate suggested logs items (and what level of detail is
      intended). >

   3.6.2  Collection Process

     The collection process should be enacted by the host or resource being
     accessed.  Depending on the importance of the data and the need to have it
     local in instances in which services are being denied, data could be kept
     local to the resource until needed or be transmitted to storage after each
     event.

     Reporting data can be done by writing to a file, writing to a line printer,
     writing over a network, or writing over a non-network port, such as a
     console port.  Each of these has importance.

     File system logging is the least resource intensive of all four candidates
     (for a given audit log).  It is also the least reliable.  If a resource has
     been compromised, the file system is the first to go.  If the network in
     front of the resource has become unusable, the data is inaccessible, unless
     a direct console port is available.

     Line printer logging is useful in system where permanent and immediate logs
     are required.  A real time system is an example of this, where the exact
     point of a failure or attack must be recorded.  A laser printer, or other
     device which buffers data between the auditing system and storage device
     may suffer from lost data if buffers contain the needed data at a critical
     instant.

     Reporting over the network provides for allowing a remote host to store
     data in a more permanent and possibly more reliable manner.  However, this
     consumes bandwidth (at the minimum), exposes the audit data in an easy
     package to a interloper, and could be lost during network denial.

     Reporting over a console port ensures the delivery of the data follows the
     hardware design.  The limitations are that the console port requires
     physical security and using console ports on machines more than a short
     distance away (e.g., across a campus) may require a phone line in addition
     to the network connection.  In some instances, this is one more resource
     that may be constrained.

   3.6.3  Collection Load

     Collecting running data may result in a quick accumulation of bytes.
     Storage of this must be considered in advance.  There are a few ways to
     limit the required storage space.  Data may be compressed using one of many
     methods.  Another approach is to only archive summaries of activity
     (possibly losing some detail in the process).  Data may be archived for
     just a fixed period of time, then is it permanently removed.

     The issue of archiving security data differs from archiving network
     management and application data.  Network management data (statistics) can
     be reduced by altering the reporting period.  Security data does not have
     that option (for the most part).  Security data also does not have the
     permanence of application data.

   3.6.4  Handling the Data/Preservation

     Security data should be protected as least as much as any other data is
     protected because from it, much can be inferred.  Security data may give
     away enough secrets to allow a "masquerader" to impersonate an authorized
     administrator.

     Data may also become key to the investigation, apprehension, or prosecution
     of the perpetrator of an incident.  Because of this, the data needs to be
     protected and clearly documented.  For this reason it is advisable to seek
     the advice of local legal council or law enforcement when deciding how
     security data is to be treated.  This should happen before an incident
     occurs.

     If a data handling plan is not cleared prior to an incident, this does not
     mean that it is useless.  It means two things.  You may not have recourse
     in the aftermath of an event.  You may also me liable for penalties
     resulting from your treatment of the data too.

     3.6.5  Audit Data Precautions

     In certain instances, audit data may contain personal information.
     Searching through the data, even for just a routine check of the network's
     security could present an invasion of privacy or make the auditing entity
     privy to information it should not be allowed to have.  Note that this is
     not automatically true - not all data is "sensitive" and "sensitive"
     differs by locale.

     Another danger presented by auditing data is that it may reveal a pattern
     of incidents.  If an organization knows about the incidents but permits
     them to continue and this results in damage to another organization
     ("downstream"), legal action could result.  An organization may also be
     liable for not (making a "best effort" to) analyze this data for incidents.

   3.7  Backups

4.   ARCHITECTURE

   4.1  Objectives
   4.2  Service Configurations
   4.3  Network Configurations
        Topology
        Infrastructure elements (DNS, mail hub, information servers, etc.)
        Network management
   4.4  Firewalls

5.   INCIDENT HANDLING

     This section of the document will supply guidance to be applied before,
     during and after a computer security incident is in progress on a machine,
     network, site, or multi-site environment.  The operative philosophy in the
     event of a breach of computer security is to react according to a plan.
     This is true whether the breach is the result of an external intruder
     attack, unintentional damage, a student testing some new program to
     exploit a software vulnerability, or a disgruntled employee.  Each of the
     possible types of events described above should be considered for an
     adequate contingency plan.  Without a proactive approach to protect
     the assets in case of an incident the handling process can not be as
     efficient as with well prepared procedures, methods and policies in place.

     Traditional computer security, while quite important in the overall
     site security plan, usually falls heavily on protecting systems from
     attack, and perhaps monitoring systems to detect attacks.  Little
     attention is usually paid for how to actually handle the attack when
     it occurs.  The result is that when an attack is in progress, many
     decisions are made in haste and can be damaging to tracking down the
     source of the incident, collecting evidence to be used in prosecution
     efforts, preparing for the recovery of the system, and protecting the
     valuable data contained on the system.

     One of the most important but often overlooked benefit for efficient
     incident handling is an economic one.  Having both technical and
     managerial personnel respond to an incident requires considerable
     resources, resources which could be utilized more profitably if an
     incident did not require their services.  If these personnel are
     trained to handle an incident efficiently, less of their time is
     required to deal with that incident.

     Due to the worldwide network most of the incidents are not restricted
     to one site only.  Operating systems vulnerabilities apply (in some
     cases) to several millions of systems and many vulnerabilities are
     exploited within the network itself.  Therefore it is vital for
     all sites that all involved parties are informed as soon as possible.

     Another benefit is related to public relations.  News about
     computer security incidents tends to be damaging to an
     organization's stature among current or potential clients.
     Efficient incident handling minimizes the potential for negative
     exposure.

     A final benefit of efficient incident handling is related to legal
     issues.  It is possible that in the near future organizations may
     be sued because one of their nodes was used to launch a network
     attack.  In a similar vein, people who develop patches or
     workarounds may be sued if the patches or workarounds are
     ineffective, resulting in damage to systems, or if the patches or
     workarounds themselves damage systems.  Knowing about operating
     system vulnerabilities and patterns of attacks and then taking
     appropriate measures is critical to circumventing possible legal
     problems.

     This chapter is arranged such that a list of relevant topics may be
     generated from the Table of Contents to provide a starting point for
     creating a policy for handling ongoing incidents.  The main points to
     be included in a policy for handling incidents are:

      o Preparing and planning (what are the goals and objectives in
        handling an incident).
      o Notification (who should be contacted in case of an incident).
      o Evaluation (how serious is the incident).
      o Handling (what should be done when an incident occurs).
        This especially includes:
         - Notification (who should be notified about the incident).
         - Containment (how can the damage be limited).
         - Eradication (eliminate the reasons for the incident).
         - Recovery (how to reestablish service and systems).
         - Follow Up (what actions should be taken after the incident).
         - Legal/Investigative implications (what are the legal and
           prosecutorial implications of the incident).
         - Documentation Logs (what records should be kept from before,
           during, and after the incident).
      o Aftermath (overall implications of past incidents).
      o Responsibilities (for planning and handling an incident).

     Each of these points is important in an overall plan for handling
     incidents. The remainder of this chapter will detail the issues
     involved in each of the relevant topics, and provide some guidance
     as to what should be included in a site policy for handling incidents.

     Guidelines for End User involvement in dealing with compromised
     accounts and vulnerabilities are covered in the corresponding
     "End User Security Handbook" [RFC xxx]. Especially interesting
     for Site Administrators which act as Site Security Contact in
     assisting other users and administrators in dealing with incidents
     are the "Guidelines and Recommendations for Incident Processing"
     [RFC xxx].


   5.1  Preparing and Planning for Incident Handling

     Part of handling an incident is being prepared to respond before
     the incident occurs.  This includes establishing a suitable level
     of protections as explained in the chapters before.  Not only are
     incidents avoided through this protection but if the incident
     becomes severe, the damage which can occur is limited.  Protection
     includes preparing incident handling guidelines as part of a
     contingency plan for your organization or site.  Having written
     plans eliminates much of the ambiguity which occurs during an
     incident, and will lead to a more appropriate and thorough set of
     responses.  As explained for the site specific contingency plan in
     section xxx it is vital important to test the proposed plan before
     an incident actually occurs through 'dry runs'.

     Once a site has recovered from and incident, site policy and
     procedures should be reviewed to encompass changes to prevent
     similar incidents.  If an incident is based on poor policy, and
     unless the policy is changed, then one is doomed to repeat the past.
     Even without an incident, it would be prudent to review policies and
     procedures on a regular basis.  Reviews are imperative due to
     today's changing computing environments.  To improve this process
     a problem reporting procedure should be implemented to describe,
     in detail, the incident and the solutions to the incident.  Each
     incident should be reviewed by the site security subgroup to allow
     understanding of the incident with possible suggestions to the
     site policy and procedures.

     Learning to respond efficiently to an incident is important for
     numerous reasons:

      o protect the assets which are to protect by normal security
        in case of a worst event
      o protect your resources which could be utilized more profitably
        if an incident did not require their services
      o take care that (government) regulations are complied with
      o prevent use of your systems against other systems (which
        could incur legal liability)
      o minimize the potential for negative exposure

     As in any set of pre-planned procedures, attention must be placed
     on a set of goals for handling an incident.  These goals will be
     prioritized differently depending on the site.  The set of goals will
     be closely related to the goals for security in general.  Therefore,
     the same guidelines as in section xxx for security in general might be
     applied here.  A specific set of objectives can be identified for
     dealing with incidents:

      o Figure out how it happened.
      o Find out how to avoid further exploitations.
      o Avoid escalation and further incidents.
      o Recover from the incident.
      o Find out who did it.

     Due to the nature of the incident there might be a conflict between
     analyzing the original source of a problem instead of restoring
     systems and services.  In this case overall goals (like assuring
     the integrity of (life) critical systems) might be the reason for
     not analyzing an incident.  Of course this is an important management
     decision, but all involved parties must be aware that without a
     analysis the same incident may happen again.

     It is important to prioritize actions to be taken during an
     incident well in advance of the time an incident occurs.
     Sometimes an incident may be so complex that it is impossible to
     do everything at once to respond to it; priorities are essential.
     Although priorities will vary from institution to institution, the
     following suggested priorities may serve as a starting point for
     defining an organization's response:

      o Priority one -- protect human life and people's
        safety; human life always has precedence over all
        other considerations.

      o Priority two -- protect classified and/or sensitive
        data.  Prevent exploitation of classified and/or
        sensitive systems, networks or sites.  Inform effected
        classified and/or sensitive systems, networks or sites
        about already occurred penetrations.
        (Be aware of regulations by your site or by government)

      o Priority three -- protect other data, including
        proprietary, scientific, managerial and other data,
        because loss of data is costly in terms of resources.
        Prevent exploitations of other systems, networks or
        sites and inform already effected systems, networks or
        sites about successful penetrations.

      o Priority four -- prevent damage to systems (e.g., loss
        or alteration of system files, damage to disk drives,
        etc.); damage to systems can result in costly down
        time and recovery.

      o Priority five -- minimize disruption of computing
        resources; it is better in many cases to shut a system
        down or disconnect from a network than to risk damage
        to data or systems.

     An important implication for defining priorities is that once
     human life and national security considerations have been
     addressed, it is generally more important to save data than system
     software and hardware.  Although it is undesirable to have any
     damage or loss during an incident, systems can be replaced; the
     loss or compromise of data (especially classified data), however,
     is usually not an acceptable outcome under any circumstances.

     Another important concern is the effect on other, beyond the
     systems and networks where the incident occurs.
     Additionally to government regulations it is always important
     to inform effected parties at soon as possible.  Due to the fact
     of legal implications this topic should be addressed in the plans
     to avoid further delays and uncertainty for the administrators.

     Any plan for responding to security incidents should be guided by
     local policies and regulations.  Government and private sites that
     deal with classified material have specific rules that they must
     follow.

     The policies your site makes chooses how it reacts to incidents will
     shape your response.  For example, it may make little sense to create
     mechanisms to monitor and trace intruders if your site does not plan
     to take action against the intruders if they are caught.  Other
     organizations may have policies that affect your plans.  Telephone
     companies often release information about telephone traces only to
     law enforcement agencies.

     You may also note that if any legal action is planned, there
     are specific guidelines that must be followed to make sure that
     any information collected can be used as evidence.


   5.2  Notification and Point of Contacts

     It is important to establish contacts with various personnel
     before a real incident occurs.  These contacts are either local,
     related to the network or are investigative agencies.  Working
     with these contacts appropriately will help to make your incident
     handling process more efficient.

      o Point of Contact (POC) people (Technical, Administrative,
        Response Teams, Investigative, Legal, Vendors, Service
        providers), and which POCs are visible to whom.
      o Wider community (users).
      o Other sites that might be affected.
      o The public (press).

     These issues are especially important for the local person responsible
     for handling the incident, since that is the person responsible for the
     actual notification of others.  A list of contacts in each of these
     categories is an important time saver for this person during an incident.
     It can be quite difficult to find an appropriate person during an incident
     when many urgent events are ongoing.  Including relevant telephone
     numbers (also electronic mail addresses and fax numbers) in the site
     security policy is strongly recommended.


   5.2.1  Local Managers and Personnel

     When an incident is under way, a major issue is deciding who is in
     charge of coordinating the activity of the multitude of players.
     A major mistake that can be made is to have a number of "points of
     contact" (POC) that are not pulling their efforts together.  This
     will only add to the confusion of the event, and will probably
     lead to additional confusion and wasted or ineffective effort.

     The single point of contact may or may not be the person "in
     charge" of the incident.  There are two distinct rolls to fill
     when deciding who shall be the point of contact and the person in
     charge of the incident.  The person in charge will make decisions
     as to the interpretation of policy applied to the event.  The
     responsibility for the handling of the event falls onto this
     person.  In contrast, the point of contact must coordinate the
     effort of all the parties involved with handling the event.

     The point of contact must be a person with the technical expertise
     to successfully coordinate the effort of the system managers and
     users involved in monitoring and reacting to the attack.  Often
     the management structure of a site is such that the administrator
     of a set of resources is not a technically competent person with
     regard to handling the details of the operations of the computers,
     but is ultimately responsible for the use of these resources.

     Another important function of the POC is to maintain contact with
     law enforcement and other external agencies to assure that multi-agency
     involvement occurs. (In the U.S. FBI, CIA, DoD, U.S.  Army, or others
     might be concerned.)

     Finally, if legal action in the form of prosecution is involved,
     the POC may be able to speak for the site in court.  The
     alternative is to have multiple witnesses that will be hard to
     coordinate in a legal sense, and will weaken any case against the
     attackers.  A single POC may also be the single person in charge
     of evidence collected, which will keep the number of people
     accounting for evidence to a minimum.  As a rule of thumb, the
     more people that touch a potential piece of evidence, the greater
     the possibility that it will be inadmissible in court.

     It is very important to prepare a method of notification, so that
     you will know who to call and how to contact them.  For example,
     every member of the Department of Energy's CIAC Team carries a card
     with every other team member's work and home phone numbers, as well
     as pager numbers.

     <no text yet>
     + responsibilities are distributed over the whole site, this has
       strong implications for coordinating the incident process
     + in multi-site incidents this even gets worser
     + the POS might be changed during an incident because of conflicting
       priorities/responsibilities


   5.2.2  Law Enforcement and Investigative Agencies

     In the event of an incident it is important to establish contact with
     investigative agencies such as the FBI and Secret Service as soon
     as possible, for several reasons.  Local law enforcement and local
     security offices or campus police organizations should also be
     informed when appropriate.  A primary reason is that once a major
     attack is in progress, there is little time to call various
     personnel in these agencies to determine exactly who the correct
     point of contact is.  Another reason is that it is important to
     cooperate with these agencies in a manner that will foster a good
     working relationship, and that will be in accordance with the
     working procedures of these agencies.  Knowing the working
     procedures in advance and the expectations of your point of
     contact is a big step in this direction.  For example, it is
     important to gather evidence that will be admissible in a court of
     law.  If you don't know in advance how to gather admissible
     evidence, your efforts to collect evidence during an incident are
     likely to be of no value to the investigative agency with which
     you deal.  A final reason for establishing contacts as soon as
     possible is that it is impossible to know the particular agency
     that will assume jurisdiction in any given incident.  Making
     contacts and finding the proper channels early will make
     responding to an incident go considerably more smoothly.

     If your organization or site has a legal counsel, you need to
     notify this office soon after you learn that an incident is in
     progress.  At a minimum, your legal counsel needs to be involved
     to protect the legal and financial interests of your site or
     organization.  There are many legal and practical issues, a few of
     which are:

         (1) Whether your site or organization is willing to risk
             negative publicity or exposure to cooperate with legal
             prosecution efforts.

         (2) Downstream liability--if you leave a compromised system
             as is so it can be monitored and another computer is damaged
             because the attack originated from your system, your site or
             organization may be liable for damages incurred.

         (3) Distribution of information--if your site or organization
             distributes information about an attack in which another
             site or organization may be involved or the vulnerability
             in a product that may affect ability to market that
             product, your site or organization may again be liable
             for any damages (including damage of reputation).

         (4) Liabilities due to monitoring--your site or organization
             may be sued if users at your site or elsewhere discover
             that your site is monitoring account activity without
             informing users.

     Unfortunately, there are no clear precedents yet on the
     liabilities or responsibilities of organizations involved in a
     security incident or who might be involved in supporting an
     investigative effort.  Investigators will often encourage
     organizations to help trace and monitor intruders -- indeed, most
     investigators cannot pursue computer intrusions without extensive
     support from the organizations involved.  However, investigators
     cannot provide protection from liability claims, and these kinds
     of efforts may drag out for months and may take lots of effort.

     On the other side, an organization's legal council may advise
     extreme caution and suggest that tracing activities be halted and
     an intruder shut out of the system.  This in itself may not
     provide protection from liability, and may prevent investigators
     from identifying anyone.

     The balance between supporting investigative activity and limiting
     liability is tricky; you'll need to consider the advice of your
     council and the damage the intruder is causing (if any) in making
     your decision about what to do during any particular incident.

     Your legal counsel should also be involved in any decision to
     contact investigative agencies when an incident occurs at your
     site.  The decision to coordinate efforts with investigative
     agencies is most properly that of your site or organization.
     Involving your legal counsel will also foster the multi-level
     coordination between your site and the particular investigative
     agency involved which in turn results in an efficient division of
     labor.  Another result is that you are likely to obtain guidance
     that will help you avoid future legal mistakes.

     Finally, your legal counsel should evaluate your site's written
     procedures for responding to incidents.  It is essential to obtain
     a "clean bill of health" from a legal perspective before you
     actually carry out these procedures.

     One of the most important considerations in dealing with
     investigative agencies is verifying that the person who calls
     asking for information is a legitimate representative from the
     agency in question.  Unfortunately, many well intentioned people
     have unknowingly leaked sensitive information about incidents,
     allowed unauthorized people into their systems, etc., because a
     caller has masqueraded as a representative of a government agency
     (e. g. the FBI or Secret Service in the US).  A similar consideration
     is using a secure means of communication.  Because many network attackers
     can easily reroute electronic mail, avoid using electronic mail to
     communicate with other agencies (as well as others dealing with the
     incident at hand).  Non-secured phone lines (e. g., the phones
     normally used in the business world) are also frequent targets for
     tapping by network intruders, so be careful!

     There is no established set of rules for responding to an incident
     when the local Government becomes involved.  Normally, except by
     court order, no agency can force you to monitor, to disconnect
     from the network, to avoid telephone contact with the suspected
     attackers, etc..  As discussed before, you should consult the matter
     with your legal counsel, especially before taking an action that your
     organization has never taken.  The particular agency involved may ask you
     to leave an attacked machine on and to monitor activity on this machine,
     for example.  Your complying with this request will ensure continued
     cooperation of the agency--usually the best route towards finding the
     source of the network attacks and, ultimately, terminating these attacks.
     Additionally, you may need some information or a favor from the
     agency involved in the incident.  You are likely to get what you
     need only if you have been cooperative.  Of particular importance
     is avoiding unnecessary or unauthorized disclosure of information
     about the incident, including any information furnished by the
     agency involved.  The trust between your site and the agency
     hinges upon your ability to avoid compromising the case the agency
     will build; keeping "tight lipped" is imperative.

     Sometimes your needs and the needs of an investigative agency will
     differ.  Your site may want to get back to normal business by
     closing an attack route, but the investigative agency may want you
     to keep this route open.  Similarly, your site may want to close a
     compromised system down to avoid the possibility of negative
     publicity, but again the investigative agency may want you to
     continue monitoring.  When there is such a conflict, there may be
     a complex set of tradeoffs (e.g., interests of your site's
     management, amount of resources you can devote to the problem,
     jurisdictional boundaries, etc.).  An important guiding principle
     is related to what might be called "Internet citizenship" [22, IAB89,
     23 (xxx old references)] and its responsibilities.  Your site can shut
     a system down, and this will relieve you of the stress, resource demands,
     and danger of negative exposure.  The attacker, however, is likely
     to simply move on to another system, temporarily leaving others
     blind to the attacker's intention and actions until another path
     of attack can be detected.  Providing that there is no damage to
     your systems and others, the most responsible course of action is
     to cooperate with the participating agency by leaving your
     compromised system on.  This will allow monitoring (and,
     ultimately, the possibility of terminating the source of the
     threat to systems just like yours).  On the other hand, if there
     is damage to computers illegally accessed through your system, the
     choice is more complicated: shutting down the intruder may prevent
     further damage to systems, but might make it impossible to track
     down the intruder.  If there has been damage, the decision about
     whether it is important to leave systems up to catch the intruder
     should involve all the organizations effected.  Further
     complicating the issue of network responsibility is the
     consideration that if you do not cooperate with the agency
     involved, you will be less likely to receive help from that agency
     in the future.


   5.2.3  Computer Security Incident Handling Teams

     There now exists a number of Computer Security Incident Handling
     teams (CSIH teams) such as the CERT Coordination Center and the CIAC
     or other teams around the globe.  Teams exist for many major
     government agencies and large corporations.  If such a team is
     available, notifying it should be of primary importance during
     the early stages of an incident.  These teams are responsible for
     coordinating computer security incidents over a range of sites and
     larger entities.  Even if the incident is believed to be contained
     to a single site, it is possible that the information available
     through a response team could help in closing out the incident.

     If it is determined that the breach occurred due to a flaw in the
     systems' hardware or software, the vendor (or supplier) and a
     Computer Security Incident Handling team should be notified as soon
     as possible.  This is especially important due to the fact that
     many other systems are vulnerable, too.

     In setting up a site policy for incident handling, it may be
     desirable to create a subgroup, much like those teams that already
     exist, that will be responsible for handling computer security incidents
     for the site (or organization).  If such a team is created, it is
     essential that communication lines be opened between this team and
     other teams.  Once an incident is under way, it is difficult to open a
     trusted dialogue between other teams if none has existed before.
     (See [RFC xxx] for more information about the considerations for
     creating your own incident handling team.)


   5.2.4  Effected and involved Sites

     <no text yet>
     + special care should be taken to inform other effected sites
     + directly: but who is responsible for the contact, who is
       responsible for an incident at the other site
     + indirectly: utilizing a incident response team
     + responsibility/liability
     + privacy reasons to protect specific data


   5.2.5  Public Relations - Press Releases

     One of the most important issues to consider is when, who, and how
     much to release to the general public through the press.  There
     are many issues to consider when deciding this particular issue.
     First and foremost, if a public relations office exists for the
     site, it is important to use this office as liaison to the press.
     The public relations office is trained in the type and wording of
     information released, and will help to assure that the image of
     the site is protected during and after the incident (if possible).
     A public relations office has the advantage that you can
     communicate candidly with them, and provide a buffer between the
     constant press attention and the need of the POC to maintain
     control over the incident.

     If a public relations office is not available, the information
     released to the press must be carefully considered.      If the
     information is sensitive, it may be advantageous to provide only
     minimal or overview information to the press.  It is quite
     possible that any information provided to the press will be
     quickly reviewed by the perpetrator of the incident.  As a
     contrast to this consideration, it was discussed above that
     misleading the press can often backfire and cause more damage than
     releasing sensitive information.

     While it is difficult to determine in advance what level of detail
     to provide to the press, some guidelines to keep in mind are:

      o Keep the technical level of detail low.  Detailed
        information about the incident may provide enough
        information for copy-cat events or even damage the
        site's ability to prosecute once the event is over.
      o Keep the speculation out of press statements.
        Speculation of who is causing the incident or the
        motives are very likely to be in error and may cause
        an inflamed view of the incident.
      o Work with law enforcement professionals to assure that
        evidence is protected.  If prosecution is involved,
        assure that the evidence collected is not divulged to
        the press.
      o Try not to be forced into a press interview before you are
        prepared.  The popular press is famous for the "2am"
        interview, where the hope is to catch the interviewee off
        guard and obtain information otherwise not available.
      o Do not allow the press attention to detract from the
        handling of the event.  Always remember that the successful
        closure of an incident is of primary importance.


   5.3  Identifying an Incident

   5.3.1  It is real?

     This stage involves determining, if a problem really exist.  Of
     course many, if not most, signs often associated with virus
     infections, system intrusions, malicious users, etc., are simply
     anomalies such as hardware failures or suspicious system/user
     behavior.  To assist in identifying whether there really is an
     incident, it is usually helpful to obtain and use any detection
     software which may be available.  For example, widely available
     software packages can greatly assist someone who thinks there may
     be a virus in a personal computer.  Audit information is also
     extremely useful, especially in determining whether there is a
     network attack.  It is extremely important to obtain a system
     snapshot as soon as one suspects that something is wrong.  Many
     incidents cause a dynamic chain of events to occur, and an initial
     system snapshot may do more good in identifying the problem and
     any source of attack than most other actions which can be taken at
     this stage.  Finally, it is important to start a log book.
     Recording system events, telephone conversations, time stamps,
     etc., can lead to a more rapid and systematic identification of
     the problem, and is the basis for subsequent stages of incident
     handling.

     There are certain indications or "symptoms" of an incident which
     deserve special attention:

        o System crashes.
        o New user accounts (e.g., the account RUMPLESTILTSKIN
          has unexplainedly been created), or high activity on
          an account that has had virtually no activity for
          months.
        o New files (usually with novel or strange file names,
          such as data.xx or k).
        o Accounting discrepancies (e.g., in a UNIX system you
          might notice that the accounting file called
          /usr/admin/lastlog has shrunk, something that should
          make you very suspicious that there may be an
          intruder).
        o Changes in file lengths or dates (e.g., a user should
          be suspicious if he/she observes that the .EXE files in
          an MS DOS computer have unexplainedly grown
          by over 1800 bytes).
        o Attempts to write to system (e.g., a system manager
          notices that a privileged user in a VMS system is
          attempting to alter RIGHTSLIST.DAT).
        o Data modification or deletion (e.g., files start to
          disappear).
        o Denial of service (e.g., a system manager and all
          other users become locked out of a UNIX system, which
          has been changed to single user mode).
        o Unexplained, poor system performance (e.g., system
          response time becomes unusually slow).
        o Anomalies (e.g., "GOTCHA" is displayed on a display
          terminal or there are frequent unexplained "beeps").
        o Suspicious probes (e.g., there are numerous
          unsuccessful login attempts from another node).
        o Suspicious browsing (e.g., someone becomes a root user
          on a UNIX system and accesses file after file in one
          user's account, then another's).

     By no means does this list implies, that all possible signs are
     covered here. None of these indications is absolute "proof" that an
     incident is occurring, nor are all of these indications normally
     observed when an incident occurs.  If you observe any of these
     indications, however, it is important to suspect that an incident
     might be occurring, and act accordingly.  There is no formula for
     determining with 100 percent accuracy that an incident is
     occurring.  It is best at this point to collaborate with other
     technical and computer security personnel to make a decision
     as a group about whether an incident is occurring.


5.3.2  Types and Scope of Incidents

     Along with the identification of the incident is the evaluation of
     the scope and impact of the problem.  It is important to correctly
     identify the boundaries of the incident in order to effectively
     deal with it.  In addition, the impact of an incident will
     determine its priority in allocating resources to deal with the
     event.  Without an indication of the scope and impact of the
     event, it is difficult to determine a correct response.

     In order to identify the scope and impact, a set of criteria
     should be defined which is appropriate to the site and to the type
     of connections available.  Some of the issues are:

      o Is this a multi-site incident?
      o Are many computers at your site effected by this
        incident?
      o Is sensitive information involved?
      o What is the entry point of the incident (network,
        phone line, local terminal, etc.)?
      o Is the press involved?
      o What is the potential damage of the incident?
      o What is the estimated time to close out the incident?
      o What resources could be required to handle the incident?


   5.3.3  Assessing the Damage and Extent

     The analysis of the damage and extent of the incident can be quite time
     consuming, but should lead into some of the insight as to the nature of
     the incident, and aid investigation and prosecution.

     As soon as the breach has occurred, the entire system and all its
     components should be considered suspect.  System software is the most
     probable target.  Preparation is key to be able to detect all changes
     for a possibly tainted system.  This includes checksumming all tapes
     from the vendor using a checksum algorithm which (hopefully) is resistant
     to tampering [10].  (See sections xxx.)  Assuming original
     vendor distribution tapes are available, an analysis of all system
     files should commence, and any irregularities should be noted and
     referred to all parties involved in handling the incident.  It can be
     very difficult, in some cases, to decide which backup tapes are
     showing a correct system status; consider that the incident may have
     continued for months or years before discovery, and that the suspect
     may be an employee of the site, or otherwise have intimate knowledge
     or access to the systems.  In all cases, the pre-incident preparation
     will determine what recovery is possible.  If the system supports
     centralized logging (most do), go back over the logs and look for
     abnormalities.  If process accounting and connect time accounting
     is enabled, look for patterns of system usage.  To a lesser extent,
     disk usage may shed light on the incident.  Accounting can provide
     much helpful information in an analysis of an incident and subsequent
     prosecution.

     If you can address all aspects of a specific incident strongly depends
     on the success of this analysis.  This also effects the efficience of
     the incident handling process.  Review the lessons learned from the
     analysis and always update the policy and procedures to reflect changes
     necessitated by the incident.


   5.4  Handling an Incident

     A major topic still untouched here is how to actually respond to an
     event.  The response to an event will fall into the general
     categories:

      o Containment.
      o Eradication.
      o Recovery.
      o Follow-up.

     Two other topics are vital for the incident handling and relevant for
     all of the categories mentioned above. They are therefore addressed
     before these categories:

      o Types of notification and the exchange of information
      o Protection of evidence and activity logs

     <no text yet>
     + what are the goals for this process.
     + what is the best way to handle an incident.
     + important: do not try to handle the incident alone
     + get as much help as necessary

     <no text yet>
     + part of original RFC1244 which is to less to serve as a guideline
       but nevertheless the topics should be covered here, too

   <NOTE: this is from rfc1244; don't know if it will be included
    5.4.1  What Will You Do?

      o Restore control.
      o Relation to policy.
      o Which level of service is needed?
      o Monitor activity.
      o Constrain or shut down system.

     END OF rfc1244 section>


   5.4.1  Types of notification, Exchange of information

     When you have confirmed that an incident is occurring, the
     appropriate personnel must be notified.  How this notification is
     achieved is very important in keeping the event under control both
     from a technical and emotional standpoint. To aid prompt acknowledgment
     and understanding of the problem, the circumstances should be described
     in as much detail as possible.  Great care should be taken to which
     groups detailed technical information is given during the notification.
     For example it is helpful to pass this kind of information to an
     incident handling team.  They can assist you by providing helpful hints
     for eradicating the vulnerabilities involved in an incident.  On the
     other hand putting the critical knowledge into the public domain (e. g.
     netnews, mailing lists) may potentially put a great number of systems
     at risk of intrusion.  It is a wrong assumption, that all administrators
     are reading a particular news group, have access to operating system
     source code or can even understand the techniques well enough to take
     adequate steps.

     First of all, any notification to either local or off-site
     personnel must be explicit.  This requires that any statement (be
     it an electronic mail message, phone call, or fax) provides
     information about the incident that is clear, concise, and fully
     qualified.  When you are notifying others that will help you to
     handle an event, a "smoke screen" will only divide the effort and
     create confusion.  If a division of labor is suggested, it is
     helpful to provide information to each section about what is being
     accomplished in other efforts.  This will not only reduce
     duplication of effort, but allow people working on parts of the
     problem to know where to obtain other information that would help
     them resolve a part of the incident.

     Another important consideration when communicating about the
     incident is to be factual.  Attempting to hide aspects of the
     incident by providing false or incomplete information may not only
     prevent a successful resolution to the incident, but may even
     worsen the situation.  This is especially true when the press is
     involved.  When an incident severe enough to gain press attention
     is ongoing, it is likely that any false information you provide
     will not be substantiated by other sources.  This will reflect
     badly on the site and may create enough ill-will between the site
     and the press to damage the site's public relations.

     The choice of language used when notifying people about the
     incident can have a profound effect on the way that information is
     received.  When you use emotional or inflammatory terms, you raise
     the expectations of damage and negative outcomes of the incident.
     It is important to remain calm both in written and spoken
     notifications.  Another aspect of the choice of language used is
     that not all people speak the same language.  Due to this fact
     misunderstandings and delay may arise, especially if it is a
     multi-national incident.

     <no text yet>
     + more about international aspects

     Another issue associated with the choice of language is the
     notification to non-technical or off-site personnel.  It is
     important to accurately describe the incident without undue alarm
     or confusing messages.  While it is more difficult to describe the
     incident to a non-technical audience, it is often more important.
     A non-technical description may be required for upper-level
     management, the press, or law enforcement liaisons.  The
     importance of these notifications cannot be underestimated and may
     make the difference between handling the incident properly and
     escalating to some higher level of damage.

     <no text yet>
     + Template for minimum information exchange+
     + Advice for Reporting, e.g. Timezones in GMT,...
     + give whole information (e.g. all tcpwrapper entries belonging to
       the incident and not only talking about: "found in the tcp wrapper logs,
       I assume ..."
     + international aspects


   5.4.2  Protection of evidence and activity logs

     When you respond to an incident, document all details related to the
     incident.  This will provide valuable information to yourself and
     others as you try to unravel the course of events.  Documenting all
     details will ultimately save you time.  If you don't document every
     relevant phone call, for example, you are likely to forget a good
     portion of information you obtain, requiring you to contact the
     source of information once again.  This wastes yours and others'
     time, something you can ill afford.  At the same time, recording
     details will provide evidence for prosecution efforts, providing the
     case moves in this direction.  Documenting an incident also will help
     you perform a final assessment of damage (something your management
     as well as law enforcement officers will want to know), and will
     provide the basis for a follow-up analysis in which you can engage in
     a valuable "lessons learned" exercise.  Additionally it will help
     during later phases of the handling process, especially during the
     eradiction and recovery.

     During the initial stages of an incident, it is often infeasible to
     determine whether prosecution is viable, so you should document as if
     you are gathering evidence for a court case.  At a minimum, you
     should record:

      o All system events (audit records).
      o All actions you take (time tagged).
      o All phone conversations (including the person with whom
        you talked, the date and time, and the content of the
        conversation).

     The most straightforward way to maintain documentation is keeping a
     log book.  This allows you to go to a centralized, chronological
     source of information when you need it, instead of requiring you to
     page through individual sheets of paper.  Much of this information is
     potential evidence in a court of law.  Thus, when you initially
     suspect that an incident will result in prosecution or when an
     investigative agency becomes involved, you need to regularly (e.g.,
     every day) turn in photocopied, signed copies of your logbook (as
     well as media you use to record system events) to a document
     custodian who can store these copied pages in a secure place (e.g., a
     safe).  When you submit information for storage, you should in return
     receive a signed, dated receipt from the document custodian.  Failure
     to observe these procedures can result in invalidation of any
     evidence you obtain in a court of law.


   5.4.3  Containment

     The purpose of containment is to limit the extent of an attack.
     For example, it is important to limit the spread of a worm attack
     on a network as quickly as possible.  An essential part of
     containment is decision making (i.e., determining whether to shut
     a system down, to disconnect from a network, to monitor system or
     network activity, to set traps, to disable functions such as
     remote file transfer on a UNIX system, etc.).  Sometimes this
     decision is trivial; shut the system down if the system is
     life-critical, classified or sensitive, or if proprietary information
     is at risk!  In other cases, it is worthwhile to risk having some
     damage to the system if keeping the system up might enable you to
     identify an intruder.

     This stage should involve carrying out predetermined procedures.
     Your organization or site should, for example, define acceptable
     risks in dealing with an incident, and should prescribe specific
     actions and strategies accordingly.  This is especially important
     when a quick decision is necessary without the possibility to
     contact all involved parties and discuss the decision.  In most
     of the cases the person in charge will have not the power to make
     a difficult management decision (like to loss the results of a
     costly experiment).  Finally, notification of cognizant authorities
     should occur during this stage.

     <no text yet>
     + since I decided to make this an own section it should contain more
       more text?

     In some cases, it is prudent to remove all access or functionality
     as soon as possible, and then restore normal operation in limited
     stages.  Bear in mind that removing all access while an incident is
     in progress will obviously notify all users, including the alleged
     problem users, that the administrators are aware of a problem; this
     may have a deleterious effect on an investigation.  However, allowing
     an incident to continue may also open the likelihood of greater damage,
     loss, aggravation, or liability (civil or criminal).  That's another
     reason why the relevant decisions belong to the site policy and should
     be determined before an incident occurs.


   5.4.4  Eradiction

     Once an incident has been detected, it is important to first think
     about containing the incident.  Once the incident has been
     contained, it is now time to eradicate the cause.  But before
     eradicate the cause great care should be taken to collect all
     necessary information about the compromised system and the cause
     of the incident due later on they will disappear during the
     eradication.

     Software may be available to help you in the eradiction process.
     For example, eradication software is available to eliminate most
     viruses which infect small systems.  If any bogus files have been
     created, it is time to archive them for later use in case of a
     court case.  Thereafter delete them from the system at this point.
     In the case of virus infections, it is important to clean and
     reformat any disks containing infected files.  Finally, ensure
     that all backups are clean.  Many systems infected with viruses
     become periodically reinfected simply because people do not
     systematically eradicate the virus from backups.  After eradiction
     a new backup should be taken, too.

     Removing all vulnerabilities once an incident has occurred is
     difficult.  The key to removing vulnerabilities is knowledge and
     understanding of the breach.

     It may be necessary to go back to the original distributed tapes
     and recustomize the system.  To facilitate this worst case
     scenario, a record of the original systems setup and each
     customization change should be kept current with each change to
     the system.  In the case of a network-based attack, it is important
     to install patches for any operating system vulnerability which was
     exploited.

     <no text yet>
     + patch for vulnerabilities not available
     + uncertainty of cause

     As discussed in section $$.4.2, a security log can be most valuable
     during this phase of removing vulnerabilities.  There are two
     considerations here; the first is to keep logs of the procedures
     that have been used to make the system secure again.  This should
     include command procedures (e.g., shell scripts) that can be run
     on a periodic basis to recheck the security.  Second, keep logs of
     important system events.  These can be referenced when trying to
     determine the extent of the damage of a given incident.


   5.4.5  Recovery

     Once the cause of an incident has been eradicated, the recovery
     phase defines the next stage of action.  The goal of recovery is
     to return the system to normal.  In general, bringing
     up services in the order of demand to allow a minimum of user
     inconvenience is the best practice.  Understand that the proper
     recovery procedures for the system are extremely important and should
     be specific to the site.

     <no text yet>
     + more text needed?


   5.4.6  Follow-Up

     Once you believe that a system has been restored to a "safe"
     state, it is still possible that holes and even traps could be
     lurking in the system.  One of the most important stages of
     responding to incidents is also the most often omitted---the
     follow-up stage.  In the follow-up stage, the system should
     be monitored for items that may have been missed during the
     cleanup stage.  It would be prudent to utilize some of the tools
     mentioned in section xxx (e.g., xxx) as a start.  Remember,
     these tools don't replace continual system monitoring and good
     systems administration procedures.

     The follow-up stage is important for another reason, too, because
     it helps those involved in handling the incident develop a set of
     "lessons learned" (see section $$.5) to improve future performance
     in such situations.  This stage also provides information which
     justifies an organization's computer security effort to management,
     and yields information which may be essential in legal proceedings.

     The most important element of the follow-up stage is performing a
     postmortem analysis.  Exactly what happened, and at what times?
     How well did the staff involved with the incident perform?  What
     kind of information did the staff need quickly, and how could they
     have gotten that information as soon as possible?  What would the
     staff do differently next time?  A follow-up report is valuable
     because it provides a reference to be used in case of other
     similar incidents.  Creating a formal chronology of events
     (including time stamps) is also important for legal reasons.
     Similarly, it is also important to as quickly obtain a monetary
     estimate of the amount of damage the incident caused in terms of
     any loss of software and files, hardware damage, and manpower
     costs to restore altered files, reconfigure affected systems, and
     so forth.  This estimate may become the basis for subsequent
     prosecution activity.


   5.5  Aftermath of an Incident

     In the wake of an incident, several actions should take place.  These
     actions can be summarized as follows:

         (1) An inventory should be taken of the systems' assets,
             i. e., a careful examination should determine how the
             system was affected by the incident,

         (2) The lessons learned as a result of the incident
             should be included in revised security plan to
             prevent the incident from re-occurring,

         (3) A new risk analysis should be developed in light of the
             incident,

         (4) An investigation and prosecution of the individuals
             who caused the incident should commence, if it is
             deemed desirable.

     All four steps should provide feedback to the site security policy
     committee, leading to prompt re-evaluation and amendment of the
     current policy.

     If an incident is based on poor policy, and unless the policy is
     changed, then one is doomed to repeat the past.  Once a site has
     recovered from and incident, site policy and procedures should be
     reviewed to encompass changes to prevent similar incidents.  Even
     without an incident, it would be prudent to review policies and
     procedures on a regular basis.  Reviews are imperative due to
     today's changing computing environments.

     After an incident, it is prudent to write a report describing the
     incident, method of discovery, correction procedure, monitoring
     procedure, and a summary of lesson learned.  This will aid in the
     clear understanding of the problem.  Remember, it is difficult to
     learn from an incident if you don't understand the source.

     <no text yet>
     + improving proactive methods
     + educate users, administrators and managers


   5.6  Responsibilities

     <no text yet>
     This is a total new section but it address some aspects because
     we sometimes experience a somewhat strange interpretation of
     responsibility. One example: to protect a network is a good thing,
     but to protect someone other's network is difficult, if you don't
     have the authority to do so. If you start to try this new hacker
     tool on a network, it is fair to contact some responsible person
     before. In the other case we experience (false or unnecessary) alerts
     every now and then. If you think about a successful breakin as a
     good education lesson, you can be sued for this like an ordinary
     cracker.
     + testing of known vulnerabilities
     + disclosure of information
     + testing of local sites
     + tiger teams (for remote sites)
     + announcing site security contact information
     + check advice before acting on behalf (hacker claims to be ROOT)
     + protect the communication
     + legal procedures
       was section 5.5.2 in the old RFC
       (most of this section is already included in 5.2.2)

6.   MAINTENANCE and EVALUATION

   6.1  Risk assessments
   6.2  Notification of problems/events

Appendices

   A1  Tools and Locations

   This section provides a brief overview of publically available security
   technology which can be downloaded from the Internet.  Many of the items
   described below will undoubtedly be surpassed or made obsolete before this
   document is published.  This section is divided into two major subsections,
   applications and tools.  The applications heading will include all end user
   programs (clients) and their supporting system infrastructure (servers).
   The tools heading will deal with the tools that a general user will never
   see or need to use, but which may be part of or used by applications, used
   to troubleshoot security problems or guard against intruders by system and
   network administrators.

   The emphasis will be on unix applications and tools, but other platforms,
   particularly PC's and Macintoshes, will be mentioned where information is
   available.

   Most of the tools and applications described below can be found in one of
   the following two archive sites:

        1.  ftp://info.cert.org:/pub/tools
            CERT Coordination Center
        2.  coast.cs.purdue.edu:/pub/tools
            Computer Operations, Audit, and Security Tools (COAST)
        3.  ftp://ftp.cert.dfn.de/pub/tools/
            DFN-CERT

   Any references to CERT or COAST will refer to these two locations.  These
   two sites act as repositories for most tools, exceptions will be noted in
   the text.  *** It is important to note that many sites, including CERT and
   COAST are mirrored throughout the Internet.  Be careful to use a "well
   known" mirror site to retrieve software and to use whatever verification
   tools possible, checksums, md5 checksums, etc... to validate that
   software.  A clever cracker might advertise security software with designed
   flaws in order to gain access to data or machines. ***

   Applications

   The sad truth is that there are very few security conscious applications
   currently available.  The real reason is the need for a security
   infrastructure which must be first put into place for most applications to
   operate securely.  There is considerable effort currently taking place to
   place this infrastructure so that applications can take advantage of
   secure communications.

   Unix based applications

   PGP
   MD5
   S/KEY
   TROJAN.PL
   PEM
   KERBEROS
   Drawbridge
   Tripwire
   logdaemon
   TCP-Wrapper
   rpcbind/portmapper replacement
   cops
   tiger
   ISS
   SATAN
   smrsh
   swatch
   identd (not really a security tool)
   DES (non-US versions)
   lsof
   sfingerd
   passwd-replacements (npasswd / ANLpasswd / passwd+ / ...)

   A2  Mailing lists and other resources

   <To be completed>
Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 2196. Expired & archived
	Select version	00 01 02 03 04 RFC 2196
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion