Internet Draft Ian Heavens
Expires May 15, 1996 Shiva Corporation
November 1995
Problems with TCP Connections Terminated by RSTs or Timers
draft-heavens-problems-rsts-01.txt
Status of this Memo
This memo is being distributed to members of the Internet community
in order to solicit their reactions to the proposals contained in it.
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts
Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
Abstract
This memo argues that the danger of segments from old TCP connections
occurs for connections terminated by RST segments, timers, or ICMP
messages, as well as those terminated by exchange of FIN segments.
To avoid this danger, RST terminated connections require a 2 way
closing handshake, with the recipient of the RST entering TIME-WAIT
and acknowledging the RST. LAST-ACK is used as the interim state
between transmission of a RST and receiving its acknowledgement, at
which point CLOSED is entered. This solution provides protection
even when interoperating with a non-conformant implementation.
The probability of data corruption is greater than the equivalent
danger in FIN terminated connections. To maintain backwards
compatibility, a TCP host or connection may be configured to revert
to [RFC-793] behaviour.
Heavens [Page 1]
Internet Draft Problems with RSTs and Timers November 1995
Table of Contents
1. Introduction
1.1 Overview
1.2 Background
1.3 RST-Terminated Connections
2. Old Segment Acceptance from RST-Terminated Connections
2.1 RST-Terminated Connections from Established State
2.2 RST-Terminated Connections during Closedown
2.3 Proof by Demonstration
2.4 Other Hazards
3. A Partial Solution: TIME-WAIT after RST transmission
3.1 User Abort
3.2 Half Duplex Close
4. RST Loss Hazards
4.1 RST Loss and Data Retransmission
4.2 RST Loss and Idle Connections
5. A Complete Solution: the 2-way Closing Handshake
5.1 Discussion
5.2 Reliable Delivery of RSTs
5.3 Interim State
5.4 Changes to RFC-793 State Machine
5.5 Interoperability with [RFC-793] Implementations
5.6 Timeout-Terminated Connections
5.7 Connections Terminated by ICMP Messages
5.8 TP4
6. Relative Probabilities of Hazards
6.1 Introduction
6.2 FIN, RST, Timer and ICMP Related Hazards
6.3 Relative Probabilities of FIN- and RST-related Hazards
7. Implications for Related TCP Standards
7.1 TIME-WAIT Assassination
7.2 High Performance Extensions
7.3 Extensions for Transactions
Heavens [Page 2]
Internet Draft Problems with RSTs and Timers November 1995
8. Backwards Compatibility Issues
8.1 Introduction
8.2 Nomenclature
8.3 Resources
8.4 Resource Starvation
8.5 Approaches to Configuration
8.6 API Semantics
8.7 Interoperability
8.8 Simplicity
9. Solution with Backwards Compatibility
9.1 Introduction
9.2 Solution
9.3 Configuration
9.4 API Semantics
9.5 Interoperability with RFC-793
Appendix A: TCP Connection State Diagram (Partial Solution)
Appendix B: TCP Connection State Diagram (Full Solution)
Appendix C: A Different Interpretation of RFC-1122
Appendix D: Modifications to RFC-793
Appendix E: Modifications to RFC-1122
Appendix F: Modifications to RFC-1213
Appendix G: Traffic Statistics for TCP Connections
Heavens [Page 3]
Internet Draft Problems with RSTs and Timers November 1995
Glossary
o FIN-Terminated Connection
A synchronised TCP connection which terminates by the 3-way
handshake, involving the exchange and reliable acknowledgement
of FIN segments.
o RST-Terminated Connection
A synchronised TCP connection which terminates by transmission
or reception of a RST.
o Timer-Terminated Connection
A synchronised TCP connection which terminates by a timeout.
o Hard Abort
A RST-terminated connection where the peer that transmits a RST
enters CLOSED state.
o Soft Abort
A RST-terminated connection where the peer that transmits a RST
enters LAST-ACK state, and the remote peer enters TIME-WAIT
state when it receives the RST.
o Hard Timeout
A timer-terminated connection where the peer that times out
immediately enters CLOSED state, possibly transmitting a RST.
o Soft Timeout
A timer-terminated connection where the peer that times out
transmits a RST, enters LAST-ACK state, and the remote peer
enters TIME-WAIT state if it receives the RST.
o API
Application Programming Interface
o MSL
Maximum Segment Lifetime
Heavens [Page 4]
Internet Draft Problems with RSTs and Timers November 1995
1. Introduction
1.1 Overview
Chapter 1 describes mechanisms for closing TCP connections, and the
significance of the TIME-WAIT state.
Chapter 2 identifies a series of connection terminations involving
RSTs that may lead to data corruption.
Chapter 3 shows how the use of TIME-WAIT state alone can provide some
protection against this.
Chapter 4 identifies scenarios where this solution is insufficient.
Chapter 5 describes a complete solution involving a 2-way closing
handshake.
Chapter 6 examines the relative probabilities of data corruption
hazards after FIN- and RST-terminated connections.
Chapter 7 looks at the implications for related TCP standards.
Chapter 8 discusses backwards compatibility.
Chapter 9 proposes a solution that guarantees backwards compatibil-
ity.
1.2 Background
FINs, RSTs, Timers and ICMP Messages
There are four mechanisms available in [RFC-793] to close a TCP con-
nection: FINs, RSTs, Timeouts and ICMP messages.
FINs may be used to close down a connection in an orderly fashion,
guaranteeing reliable delivery of all data segments transmitted
before the FIN in each direction. The requirement to reliably ack-
nowledge FINs in both directions leads to a number of half-closed
states: FIN-WAIT-1, FIN-WAIT-2, CLOSING, CLOSE-WAIT, LAST-ACK and
TIME-WAIT.
A RST closes a connection abruptly, immediately removing connection
state on transmission or reception. There are no interim states;
transition is to CLOSED on transmission or reception of a RST.
Timeouts also close a connection abruptly; a connection that times
out optionally transmits a RST, or it may assume that the peer has
Heavens [Page 5]
Internet Draft Problems with RSTs and Timers November 1995
disappeared. Timeouts also cause an immediate transition to CLOSED.
ICMP messages do not usually terminate a synchronised connection, but
it is possible. In the same way as connections terminated by RST or
timeout, there is an immediate state transition to CLOSED.
TIME-WAIT
The TIME-WAIT state has two functions in the TCP protocol. The first
is asymmetric: to ensure the reliable acknowledgement of FINs
transmitted in CLOSE-WAIT state and so the completion of the 3-way
closing handshake. The second is symmetric: to ensure that all TCP
segments, generated in either direction during the lifetime of the
connection, have drained from the network before initiation of a new
incarnation of the connection. The clock based ISN protects slow con-
nections against this threat [RFC-793]. For fast connections, this is
no longer true. In this case, TIME-WAIT prevents the acceptance of
old duplicate segments by a new incarnation utilising identical port
numbers. The relative threats are explained in the Appendix of [RFC-
1185], and in section 1.2 of [RFC-1323]. The problem is summarised
in relation to the danger of premature termination of TIME-WAIT state
by RST reception (TIME-WAIT assassination) in [RFC-1337].
No equivalent mechanism to TIME-WAIT exists for connections ter-
minated by transmission of a RST segment. Although RST transmission
is omitted from the TCP Connection State Diagram, the text of [RFC-
793] clearly states that where the transmission of a RST results in a
state change, it is to CLOSED state. Similarly, reception of a RST
causes a state change to CLOSED.
1.3 RST-Terminated Connections
There are several ways in which previously synchronised connections
are terminated by RST transmission. These include User Abort [RFC-
793] and reception of data after half-duplex close [RFC-1122]. How-
ever, not all RSTs result in connection termination. Reception of a
SYN segment addressed to a port for which there is no listening
socket results in transmission of a RST. This is associated with no
connection and is equivalent to an ICMP Port Unreachable. The origi-
nator of the SYN changes state from SYN-SENT to CLOSED on reception
of the RST, and the connection is never synchronised. Other connec-
tions in non-synchronised states respond to an unacceptable ACK,
security or precedence mismatch by transmitting a RST. In all these
cases, no connection has been synchronised nor data sent, so that
there is no danger of old data segments being accepted by subsequent
incarnations of the connection.
This memo distinguishes those synchronised connections which
Heavens [Page 6]
Internet Draft Problems with RSTs and Timers November 1995
terminate by transmission or reception of a RST by referring to them
as "RST-terminated connections".
Heavens [Page 7]
Internet Draft Problems with RSTs and Timers November 1995
2. Old Segment Acceptance from RST-Terminated Connections
Several scenarios result in the spurious acceptance of old segments
from RST-terminated connections. Two types of examples are given
here: connections aborted in Established state, and connections
aborted during the 3-way closing handshake.
2.1 RST-Terminated Connections from Established State
There are two instances of RST-terminated connections from Esta-
blished state which involve the hazard of old data acceptance by a
subsequent incarnation of the connection.
The first is a User Abort issued in Established state; the second a
half-duplex close with unread data [RFC-1122, p.88]. The sequence of
events in both case is identical: a RST is sent by the socket from
Established state, as a result of an abort, or a close with pending
unread data.
In the worst failure mode, the socket issuing the abort is acting as
a data sink. In this case a window of data segments may be in tran-
sit when the RST is received at the data source. Any of these seg-
ments - which are not duplicates - may corrupt a subsequent incarna-
tion of the connection.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- CLOSED
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
7. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
8. CLOSED <-- <SEQ=101><CTL=RST> ...
Figure 1. Connection closed by User Abort
This is shown in Figure 1. TCP A is the data source and TCP B is the
data sink. Line 1 shows a normal data segment from TCP A. An ACK
Heavens [Page 8]
Internet Draft Problems with RSTs and Timers November 1995
segment is transmitted by TCP B on line 2. TCP B user issues an
abort, transmits a RST, and enters CLOSED state on line 3, as speci-
fied in [RFC-793]. Normal data continues to be transmitted by TCP A
on line 4. Line 5 shows the arrival at TCP A of the ACK generated on
line 2. This may open the window and elicit further segments from
TCP A on lines 6 and 7, until the arrival of the RST at TCP A on line
8. At this point TCP A enters CLOSED state, and three data segments
from TCP A are in transit to TCP B.
The connection is reopened by the 3-way SYN handshake. Assume that
the clock based ISN chosen by TCP A for the new connection has been
overrun by the sequence number consumption in the previous incarna-
tion of the connection. The sequence numbers occupied by the last
three segments transmitted by TCP A during the previous incarnation
may overlap the window offered by TCP B in the current incarnation of
the connection.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
2. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
3. (old segment)...<SEQ=560><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
4. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
5. ESTABL. --> <SEQ=500><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
6. ... <SEQ=101><ACK=640><CTL=ACK> <-- ESTABL.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7a. ESTABL. --> <SEQ=600><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
8a. ESTABL. <-- <SEQ=101><ACK=640><CTL=ACK> ...
9a. ESTABL. --> <SEQ=700><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
10a ESTABL. <-- <SEQ=101><ACK=800><CTL=ACK> <-- ESTABL.
Figure 2: Accepting One Old Segment
Figure 2 shows the spurious acceptance of part of a segment from the
previous incarnation of the connection. Line 1 shows a normal data
segment from TCP A after the SYN handshake has been completed. Line
2 shows the ACK of this segment, and line 3 shows the arrival of an
old segment from the previous connection. It falls within TCP B's
Heavens [Page 9]
Internet Draft Problems with RSTs and Timers November 1995
current window and is queued in the TCP reassembly queue, as its
sequence number exceeds the next expected sequence number. Since
there is a missing segment, the next ACK in line 4 acknowledges the
previous bona fide segment, and TCP A does not detect acknowledgement
of unsent data. The next segment from the current connection arrives
at TCP B in line 5. At this point, part or all of the old segment is
delivered to the user of TCP B, depending upon the implementation of
the reassembly algorithm. This behaviour is described in [RFC-1337].
TCP B transmits the acknowledgement of the two previous segments in
line 6. TCP A transmits another segment on line 7a before the arrival
of the acknowledgement in line 8a, and assumes that it is a partial
acknowledgement of this segment. Segment transmission and ack-
nowledgement continue as usual on lines 9a and 10a. Neither TCP A
nor TCP B are aware of the spurious acceptance of old data by TCP B.
To underscore the possibility of the erroneous acceptance of several
old segments, Figure 3 shows the acceptance of two such segments.
The exchange is identical to Figure 2 until 7a, when a second old
segment from TCP A arrives at TCP B. Since TCP B has queued the
first old segment from TCP A, it delivers the entire second old seg-
ment to the user. TCP B transmits the acknowledgement on line 7b.
Line 8a and subsequent lines show the arrival of the acknowledgements
of spurious segments and the transmission of further segments by TCP
A. The acknowledgements are accepted as valid, since TCP A has
already transmitted past the sequence number acknowledged in the last
ACK from TCP B.
Heavens [Page 10]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
2. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
3. (old segment)...<SEQ=560><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
4. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
5. ESTABL. --> <SEQ=500><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
6. ... <SEQ=101><ACK=640><CTL=ACK> <-- ESTABL.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7a. (old segment)...<SEQ=640><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
7b. ... <SEQ=101><ACK=720><CTL=ACK> <-- ESTABL.
7c. ESTABL. --> <SEQ=600><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
7d. ... <SEQ=101><ACK=720><CTL=ACK> <-- ESTABL.
8a. ESTABL. <-- <SEQ=101><ACK=640><CTL=ACK> ...
9a. ESTABL. --> <SEQ=700><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
9b. ESTABL. <-- <SEQ=101><ACK=720><CTL=ACK> ...
9c. ESTABL. <-- <SEQ=101><ACK=720><CTL=ACK> ...
10a ESTABL. <-- <SEQ=101><ACK=800><CTL=ACK> <-- ESTABL.
Figure 3: Accepting Two Old Segments
These examples may be generalised to illustrate the arrival and
acceptance of a window of old segments at TCP B.
It is also possible for old segments to persist in the case where a
user abort is issued on the socket acting as a data source. This
happens when the ensuing RST arrives before one or more of the data
segments previously transmitted. This is shown in Figure 4.
Heavens [Page 11]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
3. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
4. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
(User Abort)
6. CLOSED --> <SEQ=101><CTL=RST>
7. ... <SEQ=101><CTL=RST> --> CLOSED
8. <SEQ=480><ACK=101><DATA=80><CTL=ACK> -->
9. <SEQ=560><ACK=101><DATA=80><CTL=ACK> -->
10. <SEQ=640><ACK=101><DATA=80><CTL=ACK> -->
Figure 4. User Abort and RST Reordering
The acceptance of old segments in transit on lines 8, 9 and 10 occurs
in an identical fashion to the previous example, as shown in Figures
2 and 3.
2.2 RST-Terminated Connections during Closedown
RST-terminated connections also occur from states other than Esta-
blished, during the 3-way closing handshake. Two examples are User
Abort [RFC-793] and Half Duplex Close [RFC-1122].
User Abort during Closedown
A user abort issued in FIN-WAIT-1, FIN-WAIT-2, CLOSING or CLOSE-WAIT
states results in the transmission of a RST, and the socket enters
CLOSED state [RFC-793]. The consequences of user abort in FIN-WAIT-
1, FIN-WAIT-2 and CLOSW-WAIT are similar to the previous section; an
entire window may be in transit when the RST is transmitted, if there
is data in transfer in the opposite direction to that folllowed by
the FIN. In CLOSING state, the FIN, and all data segments, have been
received by the peer before it transmits the RST, and no non-
duplicate data segments are in the network. In this case the danger
reduces to that of old duplicate segments, as in a conventionally
Heavens [Page 12]
Internet Draft Problems with RSTs and Timers November 1995
closed TCP connection.
Data received after Half Duplex Close
A host may implement a half-duplex TCP close, where an application
that has called CLOSE cannot continue to read data from the connec-
tion [RFC-1122]. Subsequent arrival of data elicits a RST. RFC-1122
does not explicitly state whether the connection enters CLOSED state.
In this section the assumption is made that it does. Appendix C
shows the results if this assumption is invalid. The danger of
acceptance of old segments still exists in the latter case.
It is straightforward to demonstrate this scenario. Berkeley UNIX
implementations of FTP [RFC-959] abort transfers in this fashion when
the receiver cannot write out the file to disk, because the disk is
full or because the file is too large. Figure 5 shows this scenario.
TCP A is a 80386 running Interactive UNIX with SpiderTCP, and TCP B
is a Sparcstation running SunOS 4.1.3. An FTP client is started from
TCP A and the 'get' command used to download a file from TCP B. TCP
A aborts the connection because the file limit is reached. The FTP
control connection is closed first and then the data connection.
Further data arrives from TCP B. Since this arrives in FIN-WAIT-2,
and BSD TCP/IP implements half duplex close, it elicits a RST from
TCP A [RFC-1122], and TIME-WAIT state is bypassed. Note that figure
5 shows only the FTP data connection, not the control connection.
TCP A TCP B
1. ESTABL. <-- <SEQ=220><ACK=100><DATA=80><CTL=ACK> <-- ESTABL.
2. ESTABL. --> <SEQ=100><ACK=300><CTL=ACK> --> ESTABL.
(File Too Large: Close)
3. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
4. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
5. FIN-WAIT-2 <-- <SEQ=300><ACK=101><DATA=80> <-- CLOSE-WAIT
6. CLOSED --> <SEQ=101><CTL=RST> --> CLOSED
Figure 5. Data Received after Half Duplex Close
If the ACK in line 4 is delayed or lost, TCP A is still in FIN-WAIT-1
in line 5, when the data arrives. A RST is transmitted and there is
a state transition to CLOSED, as above. For both these scenarions,
the danger of acceptance by a subsequent incarnation of the connec-
tion occurs in identical fashion to Figure 2.
Heavens [Page 13]
Internet Draft Problems with RSTs and Timers November 1995
2.3 Proof by Demonstration
The hazards described in this memo could be shown with the testbed
used to demonstrate the hazards of TIME-WAIT assassination in [RFC-
1337]. This might involve a client application acting as a data
source, and a server which, on receipt of the first data segment,
transmits a RST and closes the connection. Repetition of this over
a long period should cause the server to accept an old segment from a
previous incarnation as described in Figure 2 above. No duplication
of segments is required within the testbed, unlike demonstration of
TIME-WAIT Assassination.
2.4 Other Hazards
Two other hazards exist as a result of RST-terminated connections; a
de-synchronised connection as a result of an old ACK that is accept-
able but acknowledges something not yet sent, and connection failure,
also as a result of receiving an old ACK. The ACKs, like data, need
not be duplicate segments. [RFC-1337] shows how these two hazards,
referred to as H2 and H3, occur; this memo concentrates on examples
of the hazard, referred to as H1 in [RFC-1337], of erroneous accep-
tance of old segments containing data.
Heavens [Page 14]
Internet Draft Problems with RSTs and Timers November 1995
3. A Partial Solution: TIME-WAIT after RST transmission
One solution to the dangers presented in the previous section
involves the extension of the TIME-WAIT state to RST-terminated con-
nections. This turns out to offer only partial protection against
data corruption.
A connection in any of SYN-RECVD, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-
2, CLOSING and CLOSE-WAIT states enter TIME-WAIT state on transmis-
sion of a RST, rather than CLOSED. Reception of a RST causes a tran-
sition to CLOSED as in [RFC-793]. Minor modifications to the seman-
tics of TIME-WAIT are required: if entered after RST transmission,
reception of all further valid non-RST segments elicits a RST, rather
than an ACK, and the TIME-WAIT timer is restarted. Received RSTs are
ignored in TIME-WAIT, as proposed by fix F1 in [RFC-1337].
Appendix A shows extensions to the Connection State Diagram in [RFC-
793] to show state changes on RST transmission.
Heavens [Page 15]
Internet Draft Problems with RSTs and Timers November 1995
3.1 User Abort
This solution is shown in Figure 6 for the case of User Abort in
ESTABLISHED state. The hazards outlined in Figures 2 and 3 are less
likely to occur, though not impossible, as the next chapter indi-
cates.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- TIME-WAIT
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
7. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
8. CLOSED <-- <SEQ=101><CTL=RST> ...
9. ... <SEQ=480><ACK=101><DATA=80><CTL=ACK> --> TIME-WAIT
10. CLOSED <-- <SEQ=101><CTL=RST> <-- TIME-WAIT
11. ... <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> TIME-WAIT
12. CLOSED <-- <SEQ=101><CTL=RST> <-- TIME-WAIT
13. ... <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> TIME-WAIT
14. CLOSED <-- <SEQ=101><CTL=RST> <-- TIME-WAIT
15. (2 MSL)
CLOSED
Figure 6. Connection Closed by User Abort
Heavens [Page 16]
Internet Draft Problems with RSTs and Timers November 1995
3.2 Half Duplex Close
Figure 7 and 8 show modifications to include TIME-WAIT for data
received after half duplex close.
TCP A TCP B
1. ESTABLISHED ESTABLISHED
(Close)
2. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. FIN-WAIT-2 <-- <SEQ=300><ACK=101><DATA=30> <-- CLOSE-WAIT
5. TIME-WAIT --> <SEQ=101><CTL=RST> --> CLOSED
(2 MSL)
6. CLOSED
Figure 7. Data Received after Half Duplex Close
In Figure 7, TCP B does not transmit a FIN and the state transition
is from CLOSED-WAIT to CLOSED on RST reception.
Heavens [Page 17]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABLISHED ESTABLISHED
(Close)
2. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. FIN-WAIT-2 <-- <SEQ=300><ACK=101><DATA=30> <-- CLOSE-WAIT
5. TIME-WAIT --> <SEQ=101><CTL=RST> ...
(Close)
6. ... <SEQ=301><ACK=101><CTL=FIN,ACK> <-- LAST-ACK
7. <SEQ=101><CTL=RST> ---> CLOSED
8. TIME-WAIT <-- <SEQ=301><ACK=101><CTL=FIN,ACK>
9. TIME-WAIT --> <SEQ=101><CTL=RST> ---> CLOSED
(2 MSL)
10. CLOSED
Figure 8. Data Received after Half Duplex Close
In Figure 8, TCP B issues a CLOSE call and transmits a FIN before the
arrival of the RST transmitted by TCP A on line 5, so that the RST
arrives in LAST-ACK state.
Heavens [Page 18]
Internet Draft Problems with RSTs and Timers November 1995
4. RST Loss Hazards
The solution outlined above offers partial protection against data
corruption hazards arising from RST-terminated connections. However,
delay or loss of a RST gives rise to a potential hazard.
For TIME-WAIT state to provide full protection, it must commence
after both ends of a connection have stopped transmitting data. This
is guaranteed for the peer that enters TIME-WAIT, since it has
transmitted a RST and no data can follow this. The transition to
TIME-WAIT must also take place after the other peer has ceased data
transmission. The 3-way closing handshake enforces this for conven-
tionally closed connections; TIME-WAIT state is always entered after
the CLOSE-WAIT to LAST-ACK transition at the last peer to transmit
data.
The lack of an equivalent mechanism for RST-terminated connections
leads to situations where the effective TIME-WAIT state is truncated
or vanishes completely.
4.1 RST Loss and Data Retransmission
Figure 9 shows a scenario where TCP A is retransmitting data seg-
ments, lost because of network congestion. Owing to exponential
backoff, as described in [RFC-1122], the interval between successive
retransmissions is now the 60 second limit common to many TCP imple-
mentations. TCP B gives up and aborts the connection, entering
TIME-WAIT state as mandated by the partial solution in chapter 3.
The ensuing RST is lost, as the network is still congested. TCP A
continues to retransmit. At some point network congestion eases, and
a retransmitted data segment reaches TCP B. A new incarnation of the
connection may be in existence, and the data segment may be errone-
ously accepted.
Heavens [Page 19]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ESTABL.
(lost)
(User Abort)
2. ... <SEQ=101><CTL=RST> <-- TIME-WAIT
(lost)
(RTX after 60 seconds)
3. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(RTX after 60 seconds)
4. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(RTX after 60 seconds)
5. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(RTX after 60 seconds)
6. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(2 MSL)
7. CLOSED
(RTX after 60 seconds)
8. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ...
Figure 9. RST Loss and Data Retransmission
4.2 RST Loss and Idle Connections
It is not necessary for data transmission to be in progress for the
above hazard to occur. Consider the case where the user aborts an
idle connection, as shown in Figure 10. TCB B issues the abort, and
enters TIME-WAIT. The RST is lost, so that TCP A remains in ESTA-
BLISHED state. No activity occurs until TCP A tries to transmit
data, an interval that is unbounded, and so may exceed twice the MSL.
The data segment may be erroneously accepted at TCP B by a subsequent
incarnation of the connection.
Heavens [Page 20]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. ESTABL.
(User Abort)
2. ... <SEQ=101><CTL=RST> <-- TIME-WAIT
(lost)
(2 MSL)
3. CLOSED
(Interval > 2MSL)
4. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ...
Figure 10. RST Loss and Idle Connections
Heavens [Page 21]
Internet Draft Problems with RSTs and Timers November 1995
5. A Complete Solution: the 2-Way Closing Handshake
5.1 Discussion
The potential for RST loss means that a mechanism must be found to
deliver the RST reliably to the peer. This requires a 2-way closing
handshake for RST-terminated connections, with the RST being ack-
nowledged. Loss of the RST or ACK triggers retransmission of the
RST. A 3-way handshake is not required, since data received at the
peer generating the RST need not be reliably acknowledged.
TIME-WAIT state must be entered after both sides have stopped
transmitting data, i.e. after the RST has been reliably delivered. In
addition, the RST must be reliably acknowledged. There is the poten-
tial for the acknowledgement to be lost; TIME-WAIT must also fulfill
the function of ensuring that retransmitted RSTs are acknowledged.
This is analogous to the requirement that closing FINs are reliably
acknowledged in FIN-terminated connections.
This means that RST reception must cause a transition to TIME-WAIT,
in contrast to the partial solution in the previous chapter. Cross-
ing RSTs can be handled as follows: both ends of the connection
change state to TIME-WAIT on RST reception.
5.2 Reliable Delivery of RSTs
Reliable delivery of RSTs requires
o acknowledgement of the RST.
o that a RST consume sequence number space, in a similar fashion to
SYN and FIN segments, so that it may be acknowledged.
o an interim state, between RST transmission and receipt of the ack-
nowledgement.
o alteration of the semantics of RST reception so that RSTs from
current connections are acknowledged, while RSTs sent from CLOSED
state continue to be ignored.
o a timeout, since in the case where the peer has disappeared, the
interim state between RST transmission and acknowledgement may
endure forever. In addition, non-conformant TCP implementations
will not acknowledge a RST, and so there must always be a timeout
to terminate the interim state.
Heavens [Page 22]
Internet Draft Problems with RSTs and Timers November 1995
5.3 Interim State
The identity of the interim state between transmission and ack-
nowledgement of a RST deserves consideration. If a new protocol were
being designed, a separate state might be chosen. A new state for
the TCP protocol has serious implications for backwards compatibil-
ity; unless all statistics gathering applications are modified, it is
not possible to report it. Since a new state will probably always be
invisible outside TCP, a current state is preferable, if an appropri-
ate state exists.
The LAST-ACK state has similarities to the interim state; though used
once a FIN has been received and acknowledged, it corresponds to the
last state between transmission of a FIN and its acknowledgement.
Extension of LAST-ACK to RSTs as well as FINs fits neatly with the
current specification of LAST-ACK; see Appendix D.
All further reference to LAST-ACK in this chapter is qualified by the
fact that LAST-ACK is entered by RST transmission. There is no
change to the behaviour of LAST-ACK if entered by FIN transmission,
except that it follows the behaviour of other synchronised states on
RST reception, i.e. transition to TIME-WAIT state.
5.4 Changes to RFC-793 State Machine
The solution involves additional state transitions on RST transmis-
sion, to LAST-ACK, and on RST reception, to TIME-WAIT. LAST-ACK
changes state to CLOSED when the RST is acknowledged, in a similar
fashion to [RFC-793].
See Appendix B for the TCP State Connection Diagrams for RST
transmission and reception. Appendices D and E describe modifica-
tions to [RFC-793] and [RFC-1122] respectively.
LAST-ACK state after RST transmission
On RST transmission by synchronised TCP connections, there is a state
transition from the current state to LAST-ACK. The RST is
retransmitted from LAST-ACK state, like FIN segments in LAST-ACK
state, until the ACK of the RST is received, when CLOSED state is
entered.
RSTs with non-zero sequence numbers are acknowledged in all states
except LISTEN and CLOSED. FIN segments elicit a RST in LAST-ACK;
segments other than FIN, RST and ACKs of previously transmitted RSTs
are ignored in LAST-ACK. If no acknowledgement is received after the
retransmission timeout, LAST-ACK enters CLOSED state.
Heavens [Page 23]
Internet Draft Problems with RSTs and Timers November 1995
TIME-WAIT state after RST reception
Reception of a valid RST by a synchronised TCP connection, resulting
in a state transmission, triggers an ACK transmission and transition
to TIME-WAIT state.
If the peer generating the RST is in CLOSED state, the RST is not
acknowledged, since the acknowledgement would elicit a further RST.
This can be detected by only acknowledging RSTs containing a non-zero
sequence number [RFC-793, p.36]. RSTs received in SYN-SENT state as
a result of a SYN sent to a non-existent port are thus not ack-
nowledged. RSTs received in TIME-WAIT are acknowledged; other seg-
ments are ignored.
The 2-way closing handshake is shown in Figure 11. The connections
need not be in ESTABLISHED state; consult Appendix B for other states
from which the closing handshake may be initiated.
ESTABL. ESTABL.
----------> snd RST -------------->
LAST-ACK
<----------- snd ACK --------------
CLOSED TIME-WAIT
(2MSL)
CLOSED
Figure 11. 2-way Closing Handshake
Figure 12 shows the modification to Figure 6 (User Abort) as a result
of the addition of the 2-way closing handshake. Note that the ACK
of the RST and the last data segment arrive out of order, to show the
effects of data segments arriving in CLOSED state; they elicit a RST
which is received in TIME-WAIT by TCP A, and ignored, as it has a
zero sequence number.
Heavens [Page 24]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- LAST-ACK
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
7. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
8. ESTABL. <-- <SEQ=101><CTL=RST> ...
9. TIME-WAIT --> <SEQ=640><ACK=102><CTL=ACK> ...
10. ... <SEQ=480><ACK=101><DATA=80><CTL=ACK> --> LAST-ACK
11. ... <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> LAST-ACK
12. ... <SEQ=640><ACK=102><CTL=ACK> --> CLOSED
13. ... <SEQ=640><ACK=101><DATA=80><CTL=ACK> --> CLOSED
14. TIME-WAIT <-- <SEQ=0><CTL=RST> CLOSED
(2 MSL)
15. CLOSED
Figure 12. Connection Closed by User Abort
Figure 13 shows the modifications to Figure 9 (RST Loss and Data
Retransmission). TIME-WAIT state is entered after TCP B has sent one
or more RSTs and entered LAST-ACK state. Thus the duration of TIME-
WAIT exceeds the lifetime of any segments transmitted by TCP A and
TCP B.
Heavens [Page 25]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ESTABL.
(User Abort)
2. ... <SEQ=101><CTL=RST> <-- LAST-ACK
(lost)
(RTX Interval)
3. ESTABL. <-- <SEQ=101><CTL=RST> <-- LAST-ACK
4. TIME-WAIT --> <SEQ=400><ACK=102><CTL=ACK> --> CLOSED
(2 MSL)
5. CLOSED
Figure 13. RST Loss and Data Retransmission
FINs and RSTs
The RST is considered to consume the same sequence number space as a
FIN. A segment containing both a FIN and a RST is treated as a RST
[RFC-793, p70].
Crossing RSTs
The case where both sockets transmit a RST is shown in Figure 14.
The RSTs both arrive in LAST-ACK state and cause transitions to
TIME-WAIT state. Both ends spend 2MSL in TIME-WAIT state.
Note that to cope with the case where one of the RSTs is lost, RSTs
must be acknowledged in TIME-WAIT, otherwise one end continues to
transmit RSTs in LAST-ACK state, until it times out.
Heavens [Page 26]
Internet Draft Problems with RSTs and Timers November 1995
TCP A TCP B
1. ESTABL. ESTABL.
(User Abort)
2. ... <SEQ=101><CTL=RST> <-- LAST-ACK
(User Abort)
3. LAST-ACK --> <SEQ=400><CTL=RST> ...
4. TIME-WAIT <-- <SEQ=101><CTL=RST>
5. TIME-WAIT --> <SEQ=400><ACK=102<CTL=ACK> ...
6. ... <SEQ=400><CTL=RST> --> TIME-WAIT
7. ... <SEQ=101><ACK=401><CTL=ACK> <-- TIME-WAIT
8. ... <SEQ=400><ACK=102<CTL=ACK> --> TIME-WAIT
9. TIME-WAIT <-- <SEQ=101><ACK=401><CTL=ACK> ...
(2 MSL) (2MSL)
10. CLOSED CLOSED
Figure 14. Crossing RSTs
Crossing RSTs and FINs
If LAST-ACK is entered by RST transmission, it is advisable to send a
further RST if a FIN is received, as the first RST may have been
lost; this ensures that the peer receives the RST and changes state
to TIME-WAIT (from FIN-WAIT-1 or CLOSE-WAIT) as quickly as possible.
5.5 Interoperability with RFC-793 Implementations
Implementations of [RFC-793] do not wait for acknowledgement of a
RST, nor do they acknowledge a RST. An implementation of this memo
enters LAST-ACK on transmission of a RST. This is not acknowledged
if the peer conforms to [RFC-793]. The LAST-ACK state times out. A
RST from a [RFC-793] implementation triggers a state change to TIME-
WAIT, and an acknowledgement.
Thus, the solution presented in this memo offers protection even when
interoperating with non-conformant implementations, by a LAST-ACK or
TIME-WAIT state. There is still the smaller risk of loss of RSTs,
but the major risk, of data corruption because of immediate
Heavens [Page 27]
Internet Draft Problems with RSTs and Timers November 1995
transition to CLOSED, is avoided. Chapter 9 considers interoperabil-
ity issues in greater detail.
5.6 Timeout-Terminated Connections
There are similar hazards to those outlined above for timer ter-
minated connections. These are User Timeouts, Retransmission
Timeouts [RFC-793], and the commonly implemented Keepalive Timeouts.
Some TCP implementations also time out the FIN-WAIT-2 state. They all
terminate a connection by entering CLOSED state. Some implementa-
tions transmit a RST, but there is no guarantee of its arrival. Net-
work congestion causes the timeouts by the loss of retransmitted or
keepalive segments. Congestion may ease within the Maximum Segment
Lifetime and the peer may transmit segments, causing potential data
corruption.
Some protection is gained if connections wishing to timeout first
transmit a RST, enter LAST-ACK and await an ACK. Should congestion
ease, the RST will be received and acknowledged, and the peer enters
TIME-WAIT and times out after twice the MSL.
In the majority of cases congestion will not ease, the RST will not
be acknowledged, and the LAST-ACK state will time out. The effect is
to increase the lifetime of the connection after the decision to
timeout by the duration of the LAST-ACK state.
Extension of the solution to timer-terminated connections implements
the principle that all connections should include a TIME-WAIT state.
However, data integrity also relies on a closing handshake, and can-
not be ensured in a bounded time if the connection peer disappears.
At some point, it has to be assumed that the peer is dead.
5.7 Connections Terminated by ICMP Messages
Certain ICMP messages terminate a TCP connection. These include Des-
tination Unreachable, codes, Port Unreachable, Protocol Unreachable,
and ICMP messages prohibiting access for administrative reasons.
Usually, these are sent as a result of reception of a SYN segment,
and are received in SYN-SENT state. Since the connection is unsyn-
chronised, there is no danger of segments remaining in the network to
corrupt a future connection. Messages indicating unreachability
should not occur for a synchronised connection. It is possible
to construct (rather implausible) scenarios where this occurs; for
instance, when the path used by a synchronised connection changes to
include a router which prohibits access for administrative reasons,
terminating the connection.
Heavens [Page 28]
Internet Draft Problems with RSTs and Timers November 1995
These scenarios carry very low probabilities, but for completeness
and safety, synchronised TCP connections should enter TIME-WAIT state
on reception of ICMP messages. Appendix E describes alterations to
[RFC-1122] to implement this.
5.8 TP4
The OSI equivalent to TCP, TP4 [TP-Spec], has no mechanism for ord-
erly release of a connection. Connections are closed by sending a
Disconnect Request TPDU, causing a state transition to CLOSING state
[TP-Spec, section 6.7]. Subsequent reception of a Disconnect Confirm
TPDU, or another Disconnect Request TPDU, causes a state transition
to REFWAIT state [TP-Spec, Annex A, Table 11], during which the con-
nection is 'frozen' [TP-Spec, section 6.18] until this times out.
The TP4 CLOSING state is analogous to the use of LAST-ACK state after
RST transmission, and REFWAIT is analogous to TIME-WAIT state. The
RST is analogous to a Disconnect Request PDU, and the ACK to a
Disconnect Confirm PDU. The other TCP states during closedown (FIN-
WAIT-1, FIN-WAIT-2, CLOSING, CLOSE-WAIT, LAST-ACK after FIN transmis-
sion) have no analogies in TP4.
TP4 connections that timeout by retransmission close by sending a
Disconnect Request TPDU and enter CLOSING. This is analogous to the
suggested behaviour for timeouts in this memo.
The TP4 CLOSING state times out and enters REFWAIT if the Disconnect
Request TPDU is not acknowledged. In a similar fashion, this memo
times out LAST-ACK if the RST is unacknowledged.
Heavens [Page 29]
Internet Draft Problems with RSTs and Timers November 1995
6. Relative Probabilities of Hazards
6.1 Introduction
This section contains a less than rigorous analysis of the relative
probabilities of the various data corruption hazards. Note that
these probabilities are zero for TCP connections operating below
286 Mbit/s; the initial sequence number selection protects against
data corruption hazards, regardless of the mechanism for closing the
connection.
6.2 FIN, RST, Timer and ICMP Related Hazards
It is useful to compare the relative probabilities of hazards arising
from FIN-, RST-, Timer- and ICMP-terminated TCP connections.
The probability of each hazard is proportional to the amount of data
received after transition to CLOSED. Complete protection requires
that this be guaranteed to be zero. Data received after connection
closure does not cause data corruption, unless it falls within the
current window of a new incarnation of the connection.
It is assumed that the connection peer displaying the hazard is act-
ing as a data sink, maximising the data received and the probability
of failure. If the proportion of TCP connections acting as data
sinks or data sources is the same regardless of how the connection
terminates, the relative probabilities remain the same.
To simplify the arithmetic, higher order effects are ignored; for
instance, those arising from the loss of more than one TCP segment in
the period considered.
The three hazards considered are data corruption arising from the
following:
o Hazard 1: A FIN-terminated TCP connection with TIME-WAIT state
omitted.
o Hazard 2: A TCP connection aborted from Established state, with
neither TIME-WAIT nor LAST-ACK states.
o Hazard 3: A TCP connection aborted from Established state, with
TIME-WAIT but without LAST-ACK state.
Other hazards, such as connections aborted during closedown, by
timeouts, or ICMP messages, are ignored. These are much less likely
than Hazard 2. The duration of closedown is typically much shorter
than that of Established state. Timeouts require multiple loss of
Heavens [Page 30]
Internet Draft Problems with RSTs and Timers November 1995
segments in the network and represent higher order effects, with
correspondingly lower probabilities. ICMP termination of synchron-
ised connections is very rare.
Nomenclature
P1 - Relative probability of Hazard 1
P2 - Relative probability of Hazard 2
PL - Probability of loss of a TCP segment in the network
PR - Probability that a TCP connection terminates by RST
PT - Probability that a TCP connection terminates by timeout
PI - Probability that a TCP connection terminates by ICMP message
MSS - Maximum Segment Size
W - Maximum offered TCP window
o Hazard 1
Duplicate segments received after FIN-terminated connections usu-
ally arise because of the loss of an ACK, triggering an unneces-
sary retransmission. Slow start [Congestion] implies that only
one segment will be retransmitted without acknowledgement. The
relative probability of H1 is the segment size multiplied by the
probability of segment loss and the probability of termination by
FIN handshake:
P1 = MSS * PL * (1 - PR - PT - PI) = MSS * PL
ignoring higher order effects.
o Hazard 2
For a data sink, transmission of a RST in Established state and
transition to CLOSED state is followed by reception of up to a
window of data, all of which may be received during a subsequent
incarnation of the connection.
The relative probability of H2 is the window size multiplied by
the probability of termination by RST:
P2 = W * PR
o Hazard 3
In this case, a RST is lost. Any data received in TIME-WAIT
causes the TIME-WAIT timer to restart, so the hazard only occurs
if the gap between reception of segments exceeds the duration of
Heavens [Page 31]
Internet Draft Problems with RSTs and Timers November 1995
TIME-WAIT state. This occurs if several retransmitted segments
are lost, which is a higher order effect with low probability, or
if an application spontaneously transmits data after this time,
which is also unlikely. This hazard can be ignored.
6.3 Relative Probabilities of FIN- and RST-related Hazards
The ratio of probabilities of hazard H2 and H1 is
P2/P1 = W/MSS * PR/PL
Example Calculation
If Path MTU Discovery [RFC-1191] is supported, the segment size is
the Maximum Segment Size indicated by the lowest physical packet size
on the connection path, unless negotiated to be lower during connec-
tion establishment. Implementation of [RFC-1191] is not yet
widespread, so the default figure is assumed [RFC-1122, 3.3].
TCP segment size = 576 - size of TCP and IP headers = 536
Assume a window size of 32K. Appendix G summarises statistics about
TCP connections, derived from a variety of connections. Taking the
average percentage values of PR=1.1 and PL=1.2 derived from Appendix
G:
P2/P1 = W/MSS * PR/PL = 32768/536 * 1.1/1.2 = 56.
For TCP connections on the same physical network, or where Path MTU
Discovery is supported, the default segment size is larger and rela-
tive probability smaller.
The lowest ratio consistent with the data in Appendix G can be calcu-
lated from the highest value of PL (2.9) and the lowest value of PR
(0.8):
P2/P1 = 17.
It can be concluded that erroneous acceptance of data from expired
connections is significantly more likely to occur as a result of
RST-terminated connections than the equivalent hazard after FIN-
terminated connections.
Heavens [Page 32]
Internet Draft Problems with RSTs and Timers November 1995
7. Implications for Related TCP standards
Extensions to the TCP standard [RFC-793] have been made in three
areas which overlap the issues raised in this memo; to cope with
TIME-WAIT Assassination [RFC-1337], to better handle high performance
networks [RFC-1323], and to permit more efficient use of TCP for
transactions [RFC-1379 and RFC-1644].
Alternatives to TIME-WAIT have been explored, but none has been
adopted. If TIME-WAIT continues to be required for FIN-terminated
connections to avoid acceptance of segments from expired connections,
it will also be required for RST-terminated connections.
7.1 TIME-WAIT Assassination
[RFC-1337] describes several hazards caused by premature termination
(or 'assassination') of TIME-WAIT state in FIN-terminated connec-
tions, caused by RST reception. This memo discusses these hazards in
relation to RST terminated connections, and explores issues adum-
brated in [RFC-1337].
7.2 High Performance Extensions
[RFC-1072], [RFC-1106], [RFC-1185], and [RFC-1323] present TCP exten-
sions to improve performance over large bandwidth*delay product paths
and to provide reliable operation over very high speed paths.
Although truncation or replacement of TIME-WAIT is discussed, it is
not specified.
Section 4.3 and Appendix B of [RFC-1323] show that PAWS does not
allow relaxation of MSL requirement, but this is possible if times-
tamps exist across connections, as a per-host timestamp cache, and
tick at least once in a period equal to the combined duration of
TIME-WAIT and the round trip time. A different timeout for reliable
acknowledgement of closing FINs in FIN-terminated connections is dis-
cussed in Appendix B of [RFC-1323].
7.3 Extensions for Transactions
Extensions to TCP to provide efficient support for Transaction Pro-
cessing [RFC-1379 and RFC-1644] shorten but do not eliminate TIME-
WAIT [RFC-1379, p. 34 and RFC-1644, p. 12]. The effects of RST
transmission during transaction processing and any extensions to the
T/TCP state machine are left for further study.
Heavens [Page 33]
Internet Draft Problems with RSTs and Timers November 1995
8. Backwards Compatibility Issues
8.1 Introduction
There is a very large global installed base of TCP implementations
and applications utilising the protocol. Therefore, the extensions
described in this memo cannot be lightly undertaken. General issues
of backwards compatibility are summarised in [RFC-1263], section 2.2.
It may be necessary to provide options to permit maintenance of
current services and semantics. In particular, transmission or
reception of a RST immediately removes all connection state and use
of local resources; with the extensions proposed in this memo, this
is no longer true.
Interoperability between conformant and non-conformant implementa-
tions must be shown. There is the risk of overcomplicating the solu-
tion to ensure backwards compatibility.
The following issues affect backwards compatibility:resources and
resource starvation, API semantics, interoperability, and simplicity.
8.2 Nomenclature
Hard and Soft Aborts
To distinguish [RFC-793] behaviour on RST transmission and RST or
ICMP message reception from that referred to in this memo, [RFC-793]
behaviour is characterised as a Hard Abort. The connection state is
removed and the connection enters CLOSED state when a RST is sent or
received. A Soft Abort implements the behaviour described in this
memo; when a RST is transmitted, the connection enters LAST-ACK
state. When a RST or ICMP message is received, the connection enters
TIME-WAIT state. Appendix D includes modifications to the ABORT call
in [RFC-793] to allow both Hard and Soft Aborts.
Hard and Soft Timeouts
In a similar fashion, a Hard Timeout occurs when a connection enters
CLOSED state on timeout, with or without transmitting a RST. A Soft
Timeout occurs when a connection transmits a RST and enters LAST-ACK
state on timeout.
8.3 Resources
TCP connections consume a variety of resources, such as memory in the
form of per-connection state, and address space defined by IP
addresses and TCP port numbers. The number of TCP connections may be
Heavens [Page 34]
Internet Draft Problems with RSTs and Timers November 1995
limited by the size of static tables. Addresses are consumed at ini-
tiation of a TCP connection and released at connection termination.
Memory usage varies over the lifetime of a TCP connection and depends
on the state of the connection and the amount of buffered user data.
Addresses
There is competition for address resources because of the requirement
to run simultaneous instantiations of a particular application
between two communicating hosts.
The quadruple defined by (local IP address, local TCP port, foreign
IP address, foreign TCP port) must be unique at any point in time.
These comprise a 96 bit address space that identifies all TCP con-
nections currently in existence. For two communicating hosts and a
particular client-server application, the two 32 bit IP addresses are
fixed, as is the 16 bit port defining the service. The client port
number contains 65535 non-zero port numbers, but 1023 of these are
reserved for service ports, leaving 64512. New client processes
requiring a service from an identical host must be allocated an
unused port. Thus, there can be no more than 64512 simultaneous
instantiations of an application between two hosts.
Memory
Unlike address space, which is a hard limit, memory usage and maximum
number of TCP connections depend on the TCP implementation and avail-
able resource on the host. The protocol requires significant state
to be maintained for most of the lifetime of a TCP connection,
although memory usage in TIME-WAIT and LAST-ACK states may be minim-
ised by keeping only necessary state; little more than addresses and
ports for demultiplexing, and sequence numbers for acknowledgement.
However, memory requirements or other limits on the numbers of TCP
connections may reduce the maximum number of simultaneous connections
below the limit imposed by 64512 unique client port numbers.
Connection Termination
Resources are released at connection termination, which occurs as a
result of the following events. Note that there is a termination
event at each TCP peer; the later event defines the point at which
global resource, i.e. port numbers, can be reused.
o TIME-WAIT timeout for FIN-terminated connections (after ACK of FIN
at peer)
o SYN reception in TIME-WAIT satisfying the conditions of [RFC-1122]
Heavens [Page 35]
Internet Draft Problems with RSTs and Timers November 1995
o RST or ICMP message reception for Hard Aborts (after transmission
by peer)
o Hard Timeout
o TIME-WAIT timeout for Soft Aborts or Soft Timeouts (after ACK
reception by peer)
o LAST-ACK timeout for Soft Aborts or Soft Timeouts, against [RFC-
793] hosts, or if peer has disappeared.
Behaviour for the extensions in this memo is identical to [RFC-793]
except for the last two cases: soft aborts and soft timeouts prolong
connection termination and thus connection duration. However,
timer-terminated connections are never short lived, so increase in
connection duration is unlikely to result in problems. Connections
terminated by Soft Aborts are thus the only ones vulnerable to
resource starvation.
8.4 Resource Starvation
TIME-WAIT has a marked effect on resource usage, as it significantly
prolongs connection duration. This is also true for LAST-ACK if the
RST is not acknowledged. In this case, any conclusions derived in
this section for TIME-WAIT state apply equally to LAST-ACK.
Note that since TIME-WAIT ties up the (address, port) quadruple, the
foreign peer cannot re-establish the connection even though it ter-
minates before TIME-WAIT. Normal operation for client-server appli-
cations involves the client application being allocated the next
unused TCP port every time it establishes a new connection. As long
as it does not cycle through all 64512 available TCP client ports
before TIME-WAIT expires, the (address, port) quadruple will be
available for reuse when needed.
If the average lifetime of a TCP connection executing a specific
application is L seconds, TCP ports will be exhausted if new connec-
tions are opened at a rate exceeding 64512/L. Fin-terminated connec-
tions last for at least 2MSL or 240 seconds, the duration of TIME-
WAIT, so that the rate at which new connections are opened must be
less than 64512/240 or 268 per second (see [RFC-1379], page 5).
RST-terminated connections following [RFC-793] have no TIME-WAIT
state, but those following this memo do, with a possibly unacceptable
increase in resource usage.
The types of TCP connection which lead to resource starvation are
short lived and generated at a high rate, such as remote procedure
calls and other transaction processing applications. [RFC-1263],
Heavens [Page 36]
Internet Draft Problems with RSTs and Timers November 1995
section 3.2 summarises the problem, which is "caused by short port
numbers, long MSLs, and the misuse of TCP as a request-reply proto-
col". Processing power, I/O bandwidth and network bandwidth have
increased to the point that a TCP connection comprising a simple
request-reply may take several orders of magnitude less than a
second, excluding the TIME-WAIT state.
The issue of backwards compatibility arises with short-lived applica-
tions which always transmit a RST to close a connection, rather than
as a result of an unusual condition, such as a full disk (see Figure
5). They exploit the loophole identified by this memo to avoid
TIME-WAIT and so truncate the lifetime of a TCP connection, thereby
reducing resource consumption and permitting much higher rates of TCP
connection establishment than 268 per second. Note that the unsui-
tability of TCP for such applications has resulted in extensions for
transaction processing [RFC-1379].
TCP port resource usage cannot be decreased, so that the port address
space must be increased, if short lived RST-terminated applications
are to continue working and include the TIME-WAIT state. This either
requires modification to the TCP header or its equivalent, the use of
TCP options to extend the port address space. This problem is the
same for FIN-terminated connections and is outside the scope of this
memo; TCP port exhaustion will only be handled here by reverting to
[RFC-793] behaviour.
8.5 Approaches to Configuration
Configuration is only necessary if it is impossible to find an
elegant solution that maintains backwards compatibility. Configura-
tion of behaviour between conformance and nonconformance is a simple
mechanism to maintain backwards compatibility; ignore the problem and
revert to previous behaviour.
Global Configuration versus Per-Connection Configuration
Configuration may be global, affecting all TCP connections from that
host, or per-connection. Per-connection behaviour may be configured
statically through the API, or negotiated with the connection peer.
If both global and per-connection configuration exist, the global
option dictates behaviour for all connections that do not utilise
per-connection configuration; where there is per-connection confi-
guration, it overrides the value of the global configuration.
Per-connection configuration allows more flexibility in that indivi-
dual applications that cause problems may be configured to maintain
backwards compatibility, others conforming to this memo. A mechanism
Heavens [Page 37]
Internet Draft Problems with RSTs and Timers November 1995
must be added to the API to allow per-connection configuration, nor-
mally in the form of an option; it also relies on access to the
source code of the application at both client and server peers.
Static Configuration versus Negotiated Configuration
Behaviour can be autoconfigured via negotiation with the peer TCP
connection. Typically this uses TCP options, although there is the
potential for carrying data within RST segments. Use of TCP options
complicates the implementation and may cause interoperability prob-
lems with current implementations, unless negotiation occurs at con-
nection initiation; this approach does not permit different
behaviour according to the type of abort, since this is unknown at
connection initiation. In addition, this is an implicitly pessimis-
tic approach in that conformance to this memo only occurs between two
hosts that support the extensions. Its advantage is that no manual
configuration is necessary.
Static configuration is simpler, but requires that both peers are
independently configured to ensure backwards compatibility, since
resource starvation occurs if either the initiator or receiver of the
RST conforms to this memo.
Optimistic versus Pessimistic Configuration Defaults
Static configuration defaults may be optimistic, where they must be
changed to maintain backwards compatibility, or pessimistic, main-
taining backwards compatibility unless configured otherwise. The
choice depends on the relative frequencies of the problems caused by
the deficiencies fixed by this memo, and those caused by lack of
backwards compatibility.
Granularity of Configuration
Not all RST-terminated connections need be implemented as Hard Aborts
to preserve backwards compatibility. There are several types of RST-
and Timeout-terminated connections: User Abort, MIB II Abort, Half
Duplex Close, Precedence Mismatch, Retransmission, User and Keepalive
Timeout. It is possible to configure conformance for each of these,
but at the expense of simplicity.
A single configuration variable simplifies implementation and
administration, at the expense of the flexibility provided by a finer
granularity of configuration.
Heavens [Page 38]
Internet Draft Problems with RSTs and Timers November 1995
8.6 API Semantics
It is important to maintain the semantics of the API used to open,
use and close TCP connections. If the API supports closure through
RST transmission, Soft Aborts should appear to be identical to Hard
Aborts from the point of view of the application.
The same issues apply for handling new SYNs in TIME-WAIT as for FIN-
terminated connections [RFC-1122]. If the implementation follows
[RFC-1122] in this respect for FIN-terminated connections, it should
do the same for RST-terminated and Timer-terminated connections.
This corresponds to the requirement that the result of an open call
be successful for remote peers in TIME-WAIT states that conform to
[RFC-1122], however TIME-WAIT state is entered.
8.7 Interoperability
Interoperability between conformant and non-conformant implementa-
tions must also be shown, since conformant implementations will be in
the minority, until they diffuse throughout the Internet, and other
internets containing TCP/IP hosts, and there will always be non-
conformant implementations in existence.
8.8 Simplicity
The requirement of backwards compatibility is notorious for overcom-
plicating an otherwise elegant solution [RFC-1263]. Excessive confi-
gurability also makes implementation and interoperability testing
more difficult; the combinations that require interoperability test-
ing scale as the square of the number of configurable options.
Heavens [Page 39]
Internet Draft Problems with RSTs and Timers November 1995
9. Solution with Backwards Compatibility
9.1 Introduction
In this section a solution is sought which satisfies the requirements
of the previous chapter. The solution should permit maintenance of
current resource usage and API semantics, interoperate with current
TCP implementations, and be as simple as possible. Configuration
mechanisms and their defaults must also be considered.
9.2 Solution
Applications that generate large numbers of short lived connections
which terminate by RST are rare; the author is unaware of any exam-
ples. An optimistic approach is adopted whereby the default is to
conform to this memo. Negotiation with the TCP peer is thus
rejected, as well as on the grounds of its complexity. In addition,
a coarse grained approach is taken to configuration, since rarely
will the default behaviour need to be changed; all non-FIN terminated
connections follow the same behaviour, dictated by a single confi-
guration flag. An exception may be made in that connections aborted
by administrative action via SNMP may always be a Hard Abort.
Ideally, the option of both Hard and Soft Aborts would be provided;
this is outside the scope of this memo, but plausible extensions to
[RFC-1213] are described in Appendix F.
To allow flexibility, both a global flag and a per-connection flag
are provided; most APIs have a mechanism to configure optional
behaviour for each connection.
9.3 Configuration
A Soft_Abort flag controls behaviour. There is a global flag per
host, and a per-connection flag which can be used to override the
value of the global flag.
Soft_Abort
If true all RST-, ICMP- and Timer- terminated connections follow
the behaviour of this memo, in transmitting RSTs, receiving RSTs
or ICMP messages , and acknowledging RSTs. If false then all RST-,
ICMP- and Timer- terminated connections follow [RFC-793]. Default
is Soft_Abort=True.
9.4 API Semantics
API semantics are implementation dependent, but the socket API is
very popular. This section ensures backwards compatibility for
Heavens [Page 40]
Internet Draft Problems with RSTs and Timers November 1995
applications and implementations utilising the socket API. This is
done by maintaining current semantics and implementing a new socket
option to change behaviour.
Close
By default, socket close is asynchronous and returns immediately. If
the SO_LINGER option is set with a nonzero linger field, the socket
attempts to deliver data until a timeout equal to the value of the
linger field (in clock ticks or seconds depending on implementa-
tions). It does not return until the data is delivered and the clos-
ing handshake completed, or the timeout has expired.
If the SO_LINGER option is set with a zero linger field, a RST is
sent and the close call returns immediately. A close call executing a
Soft Abort must return immediately to maintain backwards compatibil-
ity; behavious analogous to the default close will achieve this. The
connection state is not discarded until the handshake is complete or
a timeout occurs.
New SYNs in TIME-WAIT
New SYNs may be accepted in TIME-WAIT resulting from a RST-terminated
connection if the conditions of [RFC-1122] apply, as described in
Appendix B. New SYNs may not be accepted in LAST-ACK, since this
implies that it is possible to transmit a segment successfully, so
the reliable acknowledgement of the closing RST can succeed.
Socket Option
A new socket option that can be set by the setsockopt call would
allow behaviour to be altered from the global configuration value.
The socket API could be extended to do this by defining a new option,
SO_ABORT, with the following fields.
#define SOFT_ABORT 1
typedef so_abort {
unsigned long Abort_Flags;
}
If the lowest bit in Abort_flags is set, the TCP connection conforms
to this memo; if it is clear, it conforms to [RFC-793]. The other 31
bits are reserved and must be zero. If the option is not used, the
global Soft_Abort flag dictates behaviour.
Heavens [Page 41]
Internet Draft Problems with RSTs and Timers November 1995
9.5 Interoperability with RFC-793
There are two cases where interoperability needs to be shown. The
first is shown in Figure 15; here an implementation of this memo
transmits a RST to an implementation of [RFC-793]. TCP A is the
implementation of [RFC-793] and TCP B is an implementation of this
memo.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- LAST-ACK
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> --> LAST-ACK
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> LAST-ACK
7. CLOSED <-- <SEQ=101><CTL=RST> ...
(RTX Timeout)
8. CLOSED
Figure 15. Interoperability between this memo and RFC-793
Heavens [Page 42]
Internet Draft Problems with RSTs and Timers November 1995
Figure 16 shows an implementation of [RFC-793] (TCP B) which
transmits a RST to an implementation of this memo (TCP A).
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- CLOSED
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
7. ESTABL. <-- <SEQ=101><CTL=RST> ...
8. TIME-WAIT --> <SEQ=560><ACK=102><CTL=ACK> ...
9. ... <SEQ=480><ACK=101><DATA=80><CTL=ACK> --> CLOSED
10. TIME-WAIT <-- <SEQ=101><CTL=RST> <-- CLOSED
11. ... <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> CLOSED
12. TIME-WAIT <-- <SEQ=101><CTL=RST> <-- CLOSED
13. ... <SEQ=640><ACK=102><CTL=ACK> --> CLOSED
14. TIME-WAIT <-- <SEQ=101><CTL=RST> <-- CLOSED
(2MSL)
15. CLOSED
Figure 16. Interoperability between RFC-793 and this memo
In both cases, assuming RSTs are not lost, one end of the connection
stays in TIME-WAIT or LAST-ACK for 2MSL or the retransmission timeout
respectively.
Heavens [Page 43]
Internet Draft Problems with RSTs and Timers November 1995
Security Considerations
Security issues are not discussed in this memo.
References
[Congestion]
V. Jacobson, "Congestion Avoidance and Control," ACM SIGCOMM-88,
August 1988.
[RFC-792]
J. Postel, "Internet Control Message Protocol", RFC-792,
USC/Information Sciences Institute, September 1981.
[RFC-793]
Postel, J., "Transmission Control Protocol", RFC-793,
USC/Information Sciences Institute, September 1981.
[RFC-959]
J. Postel, J. Reynolds, "File Transfer Protocol", RFC-959, ISI,
October 1985.
[RFC-1157]
M. Schoffstall, M. Fedor, J. Davin, J. Case, "A Simple Network
Management Protocol (SNMP)", RFC-1157, October 1990.
[RFC-1185]
Jacobson, V., Braden, R., and Zhang, L., "TCP Extension for High-
Speed Paths", RFC-1185, Lawrence Berkeley Labs, USC/Information
Sciences Institute, and Xerox Palo Alto Research Center, October
1990.
[RFC-1191]
J. Mogul, S. Deering, "Path MTU Discovery", RFC-1191, November
1990.
[RFC-1213]
K. McCloghrie, M. Rose, "Management Information Base for Network
Management of TCP/IP-based internets: MIB-II", RFC-1213, March
1991.
[RFC-1263]
L. Peterson, S. O'Malley, "TCP Extensions Considered Harmful",
RFC-1263, October 1991.
[RFC-1323]
Jacobson, V., Braden, R. and D. Borman "TCP Extensions for High
Performance", RFC-1323, Lawrence Berkeley Labs, USC/Information
Heavens [Page 44]
Internet Draft Problems with RSTs and Timers November 1995
Sciences Institute, and Cray Research, May 1992.
[RFC-1337]
R. Braden, "TIME-WAIT Assassination Hazards in TCP", RFC-1337,
ISI, May 1992.
[RFC-1379]
R. Braden, "Extending TCP for Transactions -- Concepts", RFC-1379,
November 1992.
[RFC-1644]
R. Braden, "T/TCP -- TCP Extensions for Transactions Functional
Specification", RFC-1644, July 1994.
[TCP/IP-Illustrated]
Gary Wright & Richard Stevens, "TCP/IP Illustrated, Volume 2",
Addison-Wesley 1995.
[TP-Spec]
Information processing systems - Open Systems Interconnection -
Connection oriented transport protocol specification ISO/IEC 8073.
Acknowledgements
Thanks to Alan Cox and Jon Crowcroft for their comments on this memo, to
George Wilkie for interpreting the TP4 specification into English, to
Nick Felisiak for first confirming my suspicions, and to Bob Braden for
[RFC-1337], which stimulated this memo.
Author's Address:
Ian Heavens
Shiva Europe
Stanwell Street
Edinburgh EH6 5NG
Scotland, UK
Phone: (UK) 31 555 5166
Fax: (UK) 31 555 0664
Email: ianh@shiva.europe.com
ian@tardis.ed.ac.uk
Heavens [Page 45]
Internet Draft Problems with RSTs and Timers November 1995
10. Appendix A: TCP Connection State Diagram (Partial Solution)
+-----------+
| SYN-RCVD +
+-----------+
+---------+ +---------+ snd RST /
|FINWAIT-1| | ESTAB | /-------------------
+---------+ +---------+ |
| | snd | +-----------+
| snd RST | RST | | CLOSE-WAIT|
\---------------------------\ | /----/ +-----------+
| | | |
V V V |
+----------+ snd RST +---------+ snd RST |
|FIN-WAIT-2|---------------->|TIME-WAIT|<-----------------------/
+----------+ +---------+
^ |
+----------+ snd RST | |
| CLOSING |------------------/ |
+----------+ |
Timeout=2MSL | +---------+
| | LAST-ACK|
V +---------+
+---------+ snd RST |
| CLOSED |<-----------------------/
+---------+
^
|
snd RST |
|
+---------+
| SYN-SENT|
+---------+
TCP Connection State Diagram: RST Transmission
Heavens [Page 46]
Internet Draft Problems with RSTs and Timers November 1995
11. Appendix B: TCP Connection State Diagram (Full Solution)
+-----------+
| SYN-RCVD +
+-----------+
+---------+ +---------+ snd RST /
|FINWAIT-1| | ESTAB | /--------------------
+---------+ +---------+ |
| | snd | +-----------+
| snd RST | RST | | CLOSE-WAIT|
\---------------------------\ | /----/ +-----------+
| | | |
V V V |
+----------+ snd RST +---------+ snd RST |
|FIN-WAIT-2|---------------->| LAST-ACK|<-----------------------/
+----------+ +---------+
^ | receive
+----------+ snd RST | | ACK of RST
| CLOSING |------------------/ |
+----------+ |
V
+---------+
| CLOSED |
+---------+
^
|
snd RST |
|
+---------+
| SYN-SENT|
+---------+
TCP Connection State Diagram: RST Transmission
Heavens [Page 47]
Internet Draft Problems with RSTs and Timers November 1995
+-----------+
| SYN-RCVD +
+-----------+
+---------+ +---------+ rcv RST, snd ACK /
|FINWAIT-1| | ESTAB | /--------------------
+---------+ +---------+ |
| rcv RST | |
| snd ACK | | +-----------+
| rcv RST, snd ACK | | | CLOSE-WAIT|
\--------------------------\ | /----/ +-----------+
| | | |
V V V |
+----------+ rcv RST,snd ACK +---------+ rcv RST, snd ACK |
|FIN-WAIT-2|---------------->|TIME-WAIT|<-----------------------/
+----------+ +---------+
^ | ^
+----------+ rcv RST, snd ACK | | | rcv RST, snd ACK +---------+
| CLOSING |------------------/ | \--------------------| LAST-ACK|
+----------+ | +---------+
Timeout=2MSL |
|
V
+---------+
| CLOSED |
+---------+
^
|
rcv RST |
|
+---------+
| SYN-SENT|
+---------+
TCP Connection State Diagram: RST Reception
Heavens [Page 48]
Internet Draft Problems with RSTs and Timers November 1995
12. Appendix C: A Different Interpretation of RFC-1122
There are problems with interpreting [RFC-1122] to respond to the
arrival of data after half duplex close with a RST and no state
change. The connection hangs if data arrives at TCP A in FIN-WAIT-2,
as Figure 17 shows.
TCP A TCP B
1. ESTABLISHED ESTABLISHED
(Close)
2. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. FIN-WAIT-2 <-- <SEQ=300><ACK=101><DATA=30> <-- CLOSE-WAIT
(user data after half duplex close)
5. FIN-WAIT-2 --> <SEQ=301><ACK=131><CTL=RST> --> CLOSED
Figure 17. Data Received in FIN-WAIT-2 after Half Duplex Close
If the ACK of the FIN is lost or delayed, and data arrives in FIN-
WAIT-1, the connection terminates without entering TIME-WAIT state.
This is shown in Figure 18.
TCP A TCP B
1. ESTABLISHED ESTABLISHED
(Close)
2. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
3. (lost) ... <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. FIN-WAIT-1 <-- <SEQ=300><ACK=100><DATA=30> <-- CLOSE-WAIT
(user data after half duplex close)
5. FIN-WAIT-1 --> <SEQ=101><CTL=RST> --> CLOSED
6. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSED
7. CLOSED <-- <SEQ=300><CTL=RST> <-- CLOSED
Figure 18. Data Received in FIN-WAIT-1 after Half Duplex Close
Heavens [Page 49]
Internet Draft Problems with RSTs and Timers November 1995
13. Appendix D: Modifications to RFC-793
The following modifications to [RFC-793] implement the solution in
this document, without regards to backwards compatibility.
o TCP State Meanings: Section 3.2, page 21
LAST-ACK should be changed to:
LAST-ACK - represents waiting for an acknowledgement of the connec-
tion termination request (FIN or RST) previously sent to the remote
TCP.
o TCP Connection State Diagram: Section 3.2, page 23
Note that this needs to be supplemented by the diagrams in Appendix
B. State transitions on RST transmission are as follows: In all
states except LISTEN, CLOSED and SYN-SENT, enter LAST-ACK. RSTs can-
not be transmitted in LISTEN; if transmitted in CLOSED, remain in
CLOSED; in SYN-SENT, go to CLOSED state.
State transitions on RST reception are as follows: Send an ACK and
enter TIME-WAIT in all states except LISTEN, CLOSED and SYN-SENT; in
LISTEN and CLOSED ignore the RST, in SYN-SENT enter CLOSED state. In
TIME-WAIT, acknowledge the RST.
o Sequence Numbers: Section 3.3, page 26
The last paragraph is changed to the following:
We have taken advantage of the numbering scheme to protect certain
control information as well. This is achieved by implicitly includ-
ing some control flags in the sequence space so they can be
retransmitted and acknowledged without confusion (i.e., one and only
one copy of the control will be acted upon). Control information is
not physically carried in the segment data space. Consequently, we
must adopt rules for implicitly assigning sequence numbers to con-
trol. The SYN, FIN and RST are the only controls requiring this pro-
tection, and these controls are used only at connection opening and
closing. For sequence number purposes, the SYN is considered to
occur before the first actual data octet of the segment in which it
occurs, while the FIN or RST is considered to occur after the last
actual data octet in a segment in which it occurs. The FIN and RST
are considered to occupy the same sequence number space; when both
are present in a segment, they both occupy the sequence number after
the last data octet. The segment length (SEG.LEN) includes both data
and sequence space occupying controls. When a SYN is present then
SEG.SEQ is the sequence number of the SYN.
Heavens [Page 50]
Internet Draft Problems with RSTs and Timers November 1995
o ABORT Call: Section 3.8, page 50
This should be changed to the following:
Abort
Format: ABORT (local connection name, type)
If the type is a Soft Abort, this command causes all pending
SENDs and RECEIVES to be aborted, LAST-ACK state to be entered,
and a special RESET message to be sent to the TCP on the other
side of the connection.
If the type is a Hard Abort, this command causes all pending
SENDs and RECEIVES to be aborted, the TCB to be removed, CLOSED
state to be entered, and a special RESET message to be sent to
the TCP on the other side of the connection.
In both cases, depending on the implementation, users may
receive abort indications for each outstanding SEND or RECEIVE,
or may simply receive an ABORT-acknowledgement.
o ABORT Call: Section 3.9, page 62
Relevant sections should be replaced by the following:
ABORT Call
....
SYN-RECEIVED STATE
ESTABLISHED STATE
FIN-WAIT-1 STATE
FIN-WAIT-2 STATE
CLOSE-WAIT STATE
Send a reset segment:
<SEQ=SND.NXT><CTL=RST>
All queued SENDs and RECEIVEs should be given "connection reset"
notification; all segments queued for transmission (except for the
RST formed above) or retransmission should be flushed.
Heavens [Page 51]
Internet Draft Problems with RSTs and Timers November 1995
If the abort is a Soft Abort, enter LAST-ACK state, and return.
If the abort is a Hard Abort, delete the TCB, enter CLOSED state,
and return.
CLOSING STATE
If the abort is a Soft Abort, send a RST, enter LAST-ACK state,
and return.
If the abort is a Hard Abort, delete the TCB, enter CLOSED state,
and return.
LAST-ACK STATE
If the abort is a Soft Abort, send a RST.
If the abort is a Hard Abort, delete the TCB, enter CLOSED state,
and return.
TIME-WAIT STATE
If the abort is a Soft Abort, send a RST.
If the abort is a Hard Abort, delete the TCB, enter CLOSED state,
and return.
Heavens [Page 52]
Internet Draft Problems with RSTs and Timers November 1995
o SEGMENT ARRIVES: Section 3.9, pages 70-71:
second check the RST bit,
SYN-RECEIVED STATE
If the RST bit is set
If this connection was initiated with a passive OPEN (i.e.,
came from the LISTEN state), then return this connection to
LISTEN state and return. The user need not be informed. If
this connection was initiated with an active OPEN (i.e., came
from SYN-SENT state) then the connection was refused, signal
the user "connection refused". In either case, all segments
on the retransmission queue should be removed. And in the
active OPEN case, enter the CLOSED state and delete the TCB,
and return.
ESTABLISHED
FIN-WAIT-1
FIN-WAIT-2
CLOSE-WAIT
If the RST bit is set then, any outstanding RECEIVEs and SEND
should receive "reset" responses. All segment queues should be
flushed. Users should also receive an unsolicited general
"connection reset" signal. If the RST bit is set and the
sequence field in the header is non-zero then transmit an ACK,
enter TIME-WAIT state, and return.
CLOSING STATE
LAST-ACK STATE
If the RST bit is set and the sequence field in the header is
non-zero then transmit an ACK, enter the TIME-WAIT state, and
return.
TIME-WAIT
If the RST bit is set and the sequence field in the header is
non-zero then transmit an ACK and return.
Heavens [Page 53]
Internet Draft Problems with RSTs and Timers November 1995
third check security and precedence
SYN-RECEIVED
If the security/compartment and precedence in the segment do not
exactly match the security/compartment and precedence in the TCB
then send a reset, and return.
ESTABLISHED STATE
If the security/compartment and precedence in the segment do not
exactly match the security/compartment and precedence in the TCB
then send a reset, any outstanding RECEIVEs and SEND should
receive "reset" responses. All segment queues should be
flushed. Users should also receive an unsolicited general
"connection reset" signal. Enter the LAST-ACK state.
Note this check is placed following the sequence check to prevent
a segment from an old connection between these ports with a
different security or precedence from causing an abort of the
current connection.
fourth, check the SYN bit,
SYN-RECEIVED
ESTABLISHED STATE
FIN-WAIT STATE-1
FIN-WAIT STATE-2
CLOSE-WAIT STATE
CLOSING STATE
LAST-ACK STATE
TIME-WAIT STATE
If the SYN is in the window it is an error, send a reset, any
outstanding RECEIVEs and SEND should receive "reset" responses,
all segment queues should be flushed, the user should also
receive an unsolicited general "connection reset" signal, enter
the LAST-ACK state, and return.
If the SYN is not in the window this step would not be reached
and an ack would have been sent in the first step (sequence
number check).
Heavens [Page 54]
Internet Draft Problems with RSTs and Timers November 1995
o SEGMENT ARRIVES: Section 3.9, page 73
LAST-ACK STATE
If entered by FIN handshake, the only thing that can arrive
in this state is an acknowledgement of our FIN. If our FIN
is now acknowledged, delete the TCB, enter the CLOSED state,
and return.
TIME-WAIT STATE
If entered by FIN handshake, the only thing that can arrive
in this state is a retransmission of the remote FIN.
Acknowledge it and restart the 2 MSL timeout.
If entered by RST reception, acknowledge RSTs and restart the
2MSL timeout. Ignore other segments.
o SEGMENT ARRIVES: Section 3.9, pages 75-76
LAST-ACK STATE
If entered by RST transmission, send a RST, restart the
retransmission timeout and return. Otherwise, do nothing.
TIME-WAIT STATE
Remain in the TIME-WAIT state. Restart the 2 MSL time-wait
timeout.
o USER TIMEOUT: Section 3.9, page 77
The behaviour of USER TIMEOUT changes to the following, and the
RETRANSMISSION TIMEOUT section includes the following note:
USER TIMEOUT
For any state, if the user timeout expires, or other timers which
cause the connection to expire such as keepalive timers, flush all
queues, signal the user "error: connection aborted due to user
timeout" in general and for any outstanding calls, send a RST, enter
the LAST-ACK state and return.
The above behaviour is also followed when the maximum number of
retransmissions has been reached, except in LAST-ACK state. In this
case there is a state transition to CLOSED.
Heavens [Page 55]
Internet Draft Problems with RSTs and Timers November 1995
RETRANSMISSION TIMEOUT
Note that SYNs, FINs and RSTs are retransmitted, like data segments.
Heavens [Page 56]
Internet Draft Problems with RSTs and Timers November 1995
14. Appendix E: Modifications to RFC-1122
Pages 87-89 of [RFC-1122], section 4.2.2.13, are modified as follows:
4.2.2.13 Closing a Connection: RFC-793 Section 3.5
A TCP connection may terminate in three ways: (1) the normal
TCP close sequence using a FIN handshake, (2) a "Soft
Abort", in which one or more RST segments are sent and the
connection enters LAST-ACK, and (3) a "Hard Abort" in which
in which one or more RST segments are sent, the TCB removed
and the connection enters CLOSED state. If a TCP connection
is closed by the remote site, the local application MUST be
informed whether it closed normally or was aborted.
The normal TCP close sequence delivers buffered data
reliably in both directions. Since the two directions of a
TCP connection are closed independently, it is possible for
a connection to be "half closed," i.e., closed in only one
direction, and a host is permitted to continue sending data
in the open direction on a half-closed connection.
A host MAY implement a "half-duplex" TCP close sequence, so
that an application that has called CLOSE cannot continue to
read data from the connection. If such a host issues a
CLOSE call while received data is still pending in TCP, or
if new data is received after CLOSE is called, its TCP
SHOULD send a RST to show that data was lost, and enter
LAST-ACK.
When a connection is closed actively, by transmitting a
FIN or RST as a result of a Soft Abort, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY accept a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
(1) assigns its initial sequence number for the new
connection to be larger than the largest sequence
number it used on the previous connection incarnation,
and
(2) returns to TIME-WAIT state if the SYN turns out to be
an old duplicate.
Heavens [Page 57]
Internet Draft Problems with RSTs and Timers November 1995
Pages 103-104 of [RFC-1122], section 4.2.3.9, are modified as fol-
lows:
4.2.3.9 ICMP Messages
TCP MUST act on an ICMP error message passed up from the IP
layer, directing it to the connection that created the
error. The necessary demultiplexing information can be
found in the IP header contained within the ICMP message.
o Source Quench
TCP MUST react to a Source Quench by slowing
transmission on the connection. The RECOMMENDED
procedure is for a Source Quench to trigger a "slow
start," as if a retransmission timeout had occurred.
o Destination Unreachable -- codes 0, 1, 5
Since these Unreachable messages indicate soft error
conditions, TCP MUST NOT abort the connection, and it
SHOULD make the information available to the
application.
DISCUSSION:
TCP could report the soft error condition directly
to the application layer with an upcall to the
ERROR_REPORT routine, or it could merely note the
message and report it to the application only when
and if the TCP connection times out.
o Destination Unreachable -- codes 2-4
These are hard error conditions, so TCP SHOULD abort
the connection. If the connection is in a synchronised
state, it should enter TIME-WAIT.
o Time Exceeded -- codes 0, 1
This should be handled the same way as Destination
Unreachable codes 0, 1, 5 (see above).
o Parameter Problem
This should be handled the same way as Destination
Unreachable codes 0, 1, 5 (see above).
Heavens [Page 58]
Internet Draft Problems with RSTs and Timers November 1995
15. Appendix F: Modifications to RFC-1213
The following modifications to [RFC-1213], MIB-II, [Page 50], would
allow complete flexibility in aborting connections.
tcpConnState OBJECT-TYPE
SYNTAX INTEGER {
closed(1),
listen(2),
synSent(3),
synReceived(4),
established(5),
finWait1(6),
finWait2(7),
closeWait(8),
lastAck(9),
closing(10),
timeWait(11),
hardAbort(12),
softAbort(13)
}
ACCESS read-write
STATUS mandatory
DESCRIPTION
"The state of this TCP connection.
The only values which may be set by a management
station are hardAbort(12) and softAbort(13).
Accordingly, it is appropriate for an agent to
return a `badValue' response if a management
station attempts to set this object to any other
value.
If a management station sets this object to the
value softAbort(13), then a RST is transmitted
and the connection enters LAST-ACK state.
If a management station sets this object to the
value hardAbort(12), then this has the effect of
deleting the TCB (as defined in RFC 793) of the
corresponding connection on the managed node,
resulting in immediate termination of the
connection. As an implementation-specific option,
a RST segment may be sent from the managed node to
the other TCP endpoint (note however that this RST
segment is not sent reliably)."
::= { tcpConnEntry 1 }
Heavens [Page 59]
Internet Draft Problems with RSTs and Timers November 1995
16. Appendix G:Traffic Statistics for TCP Connections
Statistics were measured using the netstat program on six machines:
[1] A home workstation (VMS) used for telecommuting via a 56Kb Frame
Relay link to the Internet.
[2] A DNS and mail gateway (VMS) at the University of Tucson,
Arizona.
[3] A personal workstation (SunOS 4.1.3) on Spider Systems' corporate
LAN.
[4] The BSD development system (BSD4.4-Lite) at the Computer Science
department, Berkeley, California (taken from [TCP/IP-Illustrated],
p.799).
[5] A file server (SunOS 4.1.3) on Spider Systems' corporate LAN.
[6] An application gateway (SunOS 4.1.3) between Spider Systems' cor-
porate LAN and the Internet.
The columns show statistics collected by the BSD netstat utility or
its VMS equivalent, with the exception of machine uptime. The
derivation of the statistics from the BSD TCP/IP "tcpstat" structure
is shown in parentheses.
o machine (M)
o time in days that the machine has been up (U)
o number of TCP connections established (tcpstat.tcp_connects).
o number of TCP connections aborted by RST transmission, expressed as
a sum of the total aborted excluding those aborted by reception of
data after half duplex close, and those aborted after half duplex
close ((tcpstat.tcps_drops - tcpstat.tcps_rcvafterclose) +
tcpstat.tcps_rcvafterclose).
o number of TCP connections timed out expressed as a sum of the
number timed out by retransmissions and keepalives
(tcpstat.tcps_timeoutdrop + tcpstat.tcps_keeptimeo).
o total number of TCP data segments transmitted, excluding
retransmissions (tcpstat.tcps_sndpack -
tcpstat.tcps_sndrexmitpack).
o total number of TCP data segments retransmitted
Heavens [Page 60]
Internet Draft Problems with RSTs and Timers November 1995
(tcpstat.tcps_sndrexmitpack).
M U Establ. Dropped Timed Out TXed Segs RTXed Segs.
1 2 408 4+1 263+1 135168 250
2 5 46632 456+102 7338+551 317523 4756
3 ? 138682 13349+3686 79+2345 22761633 104440
4 30 126820 44+1017 86+3219 8920528 257295
5 20 13557 198+205 43+28 1559505 1675
6 14 48226 3943+1396 11+190 11505576 67401
Percentage values for aborted and timed out connections, and for seg-
ment loss, are as follows.
Machine Dropped (%) Timed Out (%) Retransmissions (%)
1 1.2 64.7 0.18
2 1.2 16.9 1.50
3 12.3 1.75 0.46
4 0.8 2.60 2.88
5 3.0 0.52 0.11
6 11.1 0.42 0.59
Machine 3 and 5 are internal to a LAN and mostly handle NFS traffic,
so may be expected to have different patterns of connection estab-
lishment and segment losses. Dropped connections for machine 6 are
such a high proportion that some pathological system or application
problem can be suspected. These machines are excluded from calcula-
tions.
Aborted connections yield more consistent percentages than timeouts
and segment loss rates; this may be because the latter are more sus-
ceptible to the characteristics of nearby networks, whereas aborts
are a function of application or system behaviour. For instance, an
excessive proportion of machine 1's TCP connections expire because of
retransmission timeouts; this may be due to an unreliable link.
For machines 1, 2 and 4, the average percentage drop rate is 1.1%.
The average retransmission rate is 1.2%. The lowest percentage
drop rate is 0.8%, and the highest retransmission rate is 2.9%.
Heavens [Page 61]