TCP Maintenance and Minor M. Jethanandani
Extensions Cisco Systems
Internet-Draft M. Bashyam
Intended status: Informational Ocarina Systems, Inc
Expires: April 19, 2008 October 17, 2007
TCP Robustness in Persist Condition
draft-mahesh-persist-timeout-02
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Abstract
This document describes how a connection can remain infinitely in
persist condition, and its Denial of Service (DoS) implication on the
system, if there is no mechanism to recover from this anomaly.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
Jethanandani & Bashyam Expires April 19, 2008 [Page 1]
Internet-Draft TCP Robustness in Persist Condition October 2007
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Denial of Service Experimentation . . . . . . . . . . . . . . 4
3. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Role of Application . . . . . . . . . . . . . . . . . . . . . 8
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Normative References . . . . . . . . . . . . . . . . . . . 9
8.2. Informative References . . . . . . . . . . . . . . . . . . 9
Appendix A. An Appendix . . . . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
Intellectual Property and Copyright Statements . . . . . . . . . . 11
Jethanandani & Bashyam Expires April 19, 2008 [Page 2]
Internet-Draft TCP Robustness in Persist Condition October 2007
1. Introduction
RFC 1122 [RFC1122] Section 4.2.2.17, page 92 says that: A TCP MAY
keep its offered receive window closed indefinitely. As long as the
receiving TCP continues to send acknowledgments in response to the
probe segments, the sending TCP MUST allow the connection to stay
open.
The RFC goes on to say that it is important to remember that ACK
(acknowledgement) segments that contain no data are not reliably
transmitted by TCP. Therefore zero window probing SHOULD be
supported to prevent a connection from hanging forever if ACK
segments that re-opens the window is lost.
While the RFC is clear why the sender needs to continue to probe the
receiver, it is not clear why this process needs to be indefinite,
particularly if the receiver continually responds with a ACK and a
window of zero. This draft documents a negative consequence of this
indefinite attempt by the sender to probe for the receiver's offered
window.
One negative consequence of this indefinite attempt is that it makes
the sender vulnerable to a connection and send buffer exhaustion
attack by one or more malicious receivers. This leads to a Denial of
Service (DoS) where legitimate connections stop getting established
and well behaved already established connections stop making progress
in terms of data transmission.
Having the sender accumulate buffers and connection table entries
when the receiver has deliberately and maliciously closed the window
can ultimately lead to resource exhaustion on the sender. This
particular dependence on the receiver to open its zero window can be
easily exploited by a malicious receiver to launch a DoS attack
against the sender.
The condition where the sender has at least one buffer in the send
queue is referred to as persist condition. In this condition the
sender is waiting indefinitely for the receiver to open up its
window.
Resources that are compromised due to this sender behavior include
connections and send buffers, since both of these are finite pools in
any server.
The problem is applicable to TCP and TCP derived transport protocol
like SCTP.
We have done some experimention to demonstrate this problem and
Jethanandani & Bashyam Expires April 19, 2008 [Page 3]
Internet-Draft TCP Robustness in Persist Condition October 2007
looked at how many servers on the Internet are susceptible to it.
The rest of the draft will detail the experiment, suggest how the
problem needs to be addressed, why we believe it is the right
solution and what role application can play in solving this problem.
For TCP to persist indefinitely makes the end point vulnerable to a
DoS attack. We therefore clarify the purpose of zero window as
described in RFC 1122 and suggest that TCP end point SHOULD NOT keep
a connection in persist condition for an indefinite amount of time.
In most implementations, TCP runs in kernel mode as part of the
operating system. In this mode the operating system may share the
same address space as TCP. For the purposes of discussion, this
draft considers TCP protocol implementation to be a separate module
responsible for all resources such as buffers and connection control
blocks that it borrows from the operating system. The operating
system can enforce the maximum number of buffers it is willing to
give to TCP but beyond that it lets TCP decide how to manage them.
2. Denial of Service Experimentation
The effect of the receiver that stops reading data is that the sender
continues to send data till the receiver advertised window goes to
zero at which time the connection enters persist condition. Since
the sender has more buffers with data for the client, it will
continue to probe the receiver. If the sender is servicing several
such clients the effect compounds itself to the extent that the
sender runs out of buffers and/or connection resources. The sender
at this point cannot service new legitimate connections and even the
existing connections start seeing degraded service. Further, each
connection reserves a connection control block, which are of a finite
amount. Several connections in persist condition can exhaust the
connection control block pool.
To demonstrate the problem we wrote a user level program that puts
TCP connections on the HTTP server in persist condition. The client
can run on any machine and does not require a change in the kernel or
the operating system.
The client opens a TCP connection to the HTTP server with a
advertised MSS of 1460. It then sends a GET request for a large
page. The page size is large enough to ensure that the connections
send buffer always has more data than receivers maximum advertised
window. Once the window has been opened, the client application
stops reading data resulting in TCP closing the window and
advertising zero window towards the sender. For each request of a
multi-megabyte response, the connection can result in the sender
Jethanandani & Bashyam Expires April 19, 2008 [Page 4]
Internet-Draft TCP Robustness in Persist Condition October 2007
holding on to all the requested data minus the receivers advertised
window, in its send queue. If the receiver never closes the
connection, the server will continue to hold that data indefinitely
in its send queue.
The same program was then run from each client with it opening one
thousand connections towards the HTTP server. This was run from
several different machines with the result that now the server was
holding onto several thousand connections, each with more than one
megabyte worth of data on the send queue.
After verifying this behavior in the laboratory against both a Apache
and a IIS server, we then proceeded to test HTTP servers on the
Internet. To verify this behavior we needed to open only few
connections towards the servers. We chose three well known sites,
identified here as Site A, Site B and Site C for our test. We then
ran a network analyzer on the client machine to monitor the behavior
of the connection. These were our observations.
Connections to Site A went into ESTABLISHED state and after receiving
receivers advertised window worth of data went into persist
condition. The connection persisted in this mode for approximately
11 minutes and was then RST by the server.
Connections to Site B went and stayed in ESTABLISHED state. They
stayed in that state as long as the client kept the connection open.
The server in this case was Apache version 2.0. The size of the file
requested was 12.12M. The client received 200K worth of data and the
rest of the data was either queued on the send queue or in
application.
Connection to Site C went into and stayed in ESTABLISHED state. They
too stayed in that state as long as the client kept the connection
open, which was as long as five days. The server in this case was a
IIS server version 6.0. The size of the requested page was 1.09M (a
pdf file). The client had received 200K worth of data and the rest
of the data was either queued on the send queue or in application.
As can be seen from the experimentation the behavior of TCP varied
greatly between different sites. Site A appears to implement a User
Time Out (UTO) or application timeout on their connections. That
allowed it to clear the connections. However, once it was known what
the fixed timeout was, it was easy to modify the client program to
open another set of connections after the timeout. We discuss the
role of application and the use of UTO in a later section. It was
difficult to establish how much data was sitting on the send queue of
each one of these public servers as that depends on send socket
buffer size and how much data was written by the application.
Jethanandani & Bashyam Expires April 19, 2008 [Page 5]
Internet-Draft TCP Robustness in Persist Condition October 2007
Please note that it is not required for the client to issue a request
for a large page or for the server to open its window completely to
reproduce the DoS scenario. A page size larger than the advertised
window size is enough. We decided to do it with a larger response
because it enabled us to reproduce the problem with fewer number of
connections and client machines.
Persist condition clearly has a more significant impact on servers
that deal with a large number of connections (e.g. 200-300K
connections), than on end workstations that might deal with a few
connections at a time. This is because the server has a finite
number of buffers for a larger pool of connections. With dynamic
allocation of buffers, each connection is given resources as it needs
them. A high water mark set on each connection prevents the number
of enqueued buffers exceeding that mark till such time that the
number of buffers fall below a low water mark. However, that in
itself does not solve the problem as the high water mark is more than
the advertised window size.
3. Solution
The current behavior of the connection in persist condition SHALL
continue to exist as the default behavior. The solution proposed
will control the amount of time a TCP sender will spend in persist
condition waiting for receiver to open its window. Outlined are some
of the ways that this can be achieved. Default values are suggested
values and the implementor is free to choose their own value.
If the administrator of the system decides to use the proposed
solution, they will need to enable it explicitly. Optionally, the
administrator can configure a minimum and maximum threshold values
for connections and buffer resources for the total pool. Default
values of 60 and 80% of the total pool for minimum and maximum
respectively are assumed.
While implementing the solution it is important to make sure that
legitimate and well behaved receivers are not penalized for offering
zero or reduced window. Hence the solution needs to be robust. It
is also important that the solution be adaptive. While resources are
plenty, connections are allowed to spend more time in persist
condition. However, as resources become scarce the connections are
aborted sooner.
A fixed timeout value is not a effective solution. Malicious clients
can discover the timeout value and can (re)launch an attack after the
fixed timeout period.
Jethanandani & Bashyam Expires April 19, 2008 [Page 6]
Internet-Draft TCP Robustness in Persist Condition October 2007
If the solution is enabled, the global persist-condition-expiry -time
value will be set to infinity (or a very large value). Thereafter it
will adapted based on system resources availability. The persist-
condition-expiry-time is bounded above by the default value of 60
seconds and a minimum value of five seconds (or minimum persist
timeout). The administrator has the option to change the default
value. To prevent wild fluctuations in this timeout value, the time
will be recomputed only when resources change by at least 1%. If the
total pool of resources is less than minimum threshold, the persist-
condition-expiry-time value is set to infinity (a very large value).
If the resource utilization increases to being between minimum and
maximum, then persist-condition-expirty-time is first set to the
default value and thereafter decreased additively by two seconds. If
resources exceed the maximum, the persist-condition-expiry-time is
decreased multiplicatively by a factor or two. If the resource
utilization starts to decrease then persist-condition-expirty-time is
increased additively by four seconds. If the utilization falls below
minimum, the time is set to infinity.
The solution focuses on figuring on how to keep track of connections
in persist condition. The configured option of persist-condition-
expiry-time implies how long the connection will be allowed to stay
in persist condition. When the connection enters persist condition,
i.e. the receiver advertises a window of zero, the value of current
time - now, is saved in the connection entry. This entry is called
persist-condition-entry-time. In addition, the sequence number on
the connection is stored as persist-condition-sequence-number.
Thereafter every time the persist timer expires or when an ACK is
received that continues to advertise zero window, a check is done to
make sure that the difference between current time and persist-
condition-entry-time is not more than persist-condition-expiry-time.
If it is then the connection is aborted and the connection resources
are reclaimed.
The receiver's silly window avoidance mechanism will make sure that
the receiver cannot read a small amount of data and fool the sender
into taking it out of persist condition.
For the solution to be robust, it is also important to determine
which connection among the set of connections in persist condition is
selected to be terminated. To implement this effectively, we
maintain two priority queues of connections in persist condition, one
based on the amount of data in the send queue and another based on
the persist-condition-entry-time, i.e. when the connection entered
persist condition.
Whenever a buffer resource is required and the resource utilization
is more than the maximum, the connection with the highest amount of
Jethanandani & Bashyam Expires April 19, 2008 [Page 7]
Internet-Draft TCP Robustness in Persist Condition October 2007
data in the send queue is dropped, and its buffers recycled.
Whenever a connection resource is required and the connection
utilization is higher than the maximum, the connection with the
oldest persist-condition-entry-time is selected and dropped. This
achieves fairness by penalizing the connection which are consuming
the most resources.
4. Role of Application
Applications are agnostic to why TCP connections are not making
progress in terms of data transmission. TCP connections may not be
able to transmit data for a variety of reasons. Today TCP does not
provide an indication of the progress of the connection explicitly.
It is up to the application to conclude based on an examination of
the send queue backlog or implement a UTO as defined in RFC 793
[RFC0793]. A lot of commonly used applications do not implement the
UTO scheme, e.g. World Wide Web (WWW). Even if the application did
implement a UTO scheme, all applications running the system need to
have implemented the UTO for the solution to be effective. A single
application that has not implemented the UTO can cause the entire
system to be impacted negatively.
There are cases where the system is application agnostic. A classic
case of this is a TCP proxy. In that particular case, there is no
end application that can be informed of the state of the connection
for the application to take action.
Resources like TCP buffers are system wide resources and are not tied
to any particular application. TCP needs to be able to monitor
resource usage system wide when connections are in persist condition.
The application does not have the connection's sender state knowledge
to implement a robust and adaptive solution such as the one outlined
here.
Applications can assist TCP's role in solving this problem. They can
register for an event notification when the TCP connection enters or
exits persist condition. They can use the notification mechanism to
implement their own scheme of deciding which persist connections to
clear. They can also suggest timeout or retry values to TCP.
5. IANA Considerations
This document makes no request of IANA.
Jethanandani & Bashyam Expires April 19, 2008 [Page 8]
Internet-Draft TCP Robustness in Persist Condition October 2007
6. Security Considerations
This document discusses one security consideration. That is the
possible DoS attacks discussed in Section 2.
7. Acknowledgements
Thanks to Anantha Ramaiah who spent countless hours reviewing,
commenting and proposing changes to the draft. Ted Faber helped us
in clarifying the objective of this RFC. Thanks also to Fred Baker
and Elliot Lear for providing their feedback on the draft.
Our thanks to Nanda Bhajana who helped arrange the test setup to be
able to reproduce the DoS scenario.
8. References
8.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981.
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
8.2. Informative References
Appendix A. An Appendix
Jethanandani & Bashyam Expires April 19, 2008 [Page 9]
Internet-Draft TCP Robustness in Persist Condition October 2007
Authors' Addresses
Mahesh Jethanandani
Cisco Systems
170 West Tasman Drive
San Jose, California 95134
USA
Phone: +1-408-527-8230
Fax: +1-408-527-0147
Email: mahesh@cisco.com
URI: www.cisco.com
Murali Bashyam
Ocarina Systems, Inc
Fremont, CA
USA
Phone:
Fax:
Email: mbashyam@ocarinatech.com
URI:
Jethanandani & Bashyam Expires April 19, 2008 [Page 10]
Internet-Draft TCP Robustness in Persist Condition October 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Jethanandani & Bashyam Expires April 19, 2008 [Page 11]