INTERNET-DRAFT Amy Hughes, Joe Touch, John Heidemann
March 30, 1998
Expires: Sept. 30, 1998
Issues in TCP Slow-Start Restart After Idle
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.''
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any other
The distribution of this document is unlimited.
This draft discusses variations in the TCP 'slow-start restart' (SSR)
algorithm, and the unintended failure of some variations to properly
restart in some environments. SSR is intended to avoid line-rate
bursts after idle periods, where TCP accumulates permission to send
in the form of ACKs, but does not consume that permission
immediately. SSR's original "restart after send is idle" is commonly
implemented as "restart after receive is idle". The latter
unintentionally fails to restart for bidirectional connections where
the sender's burst is triggered by a reverse-path data packet, such
as in persistent HTTP. Both the former and latter are shown to permit
bursts in other circumstances. Three solutions are discussed, and
their implementations evaluated.
This document is a product of the LSAM project at ISI. Comments are
solicited and should be addressed to the authors.
Slow-Start Restart (SSR) describes one TCP behavior to respond to
long sending pauses in an open connection. When a sender becomes
idle, the normal ack-clocking mechanism which regulates traffic is no
longer present and the sender may introduce a burst of packets into
the network as large as the current congestion window (CWND). Such a
burst may be too large for the intermediate routers to handle and may
be too large for the receiver to handle at one time as well.
A send timer was first proposed [JK90] to detect idle sending
periods; the recommended response is to close the congestion window
and perform a new slow-start. However, a footnote to this first
proposed solution noted that send/receive symmetry on the channel
meant that a receive timer could be used instead to achieve the same
results. As this second solution takes advantage of a timer that is
already required (to detect packet loss) it was implemented by
Jacobson and Karels. This solution has been repeated in
implementations which derive from their work.
Bursty connections, such as the persistent connections required in
HTTP/1.1 [FGMFB97] have been found to interact in meaningful ways
with SSR . In fact, it was discovered that SSR never occurs with
HTTP/1.1 [Poo97]. This is because a new request will reset the
receive timer (as suggested in the footnote in [JK90]) and the
sending pause will not be detected [Tou97].
Further, both timer solutions depend on the retransmit timeout (RTO)
and cannot detect send pauses that are shorter than this duration.
In such cases, the sender may transmit a burst as large as the full
There are several ways of determining whether a connection is at risk
of sending a burst of packets into the channel. We will discuss each
method below, from the least radical to the most radical.
The use of a receive timer is the most common burst detection method.
It is attractive because it is simple and makes use of an existing
timer. However, a receive timer does not properly detect bursts in
HTTP/1.1 because the timer is cancelled when the request packet is
received. Further, when the connection is idle for less than a full
RTO, a burst cannot be detected. Such a burst can happen when the
connection is "nearly idle" or when acks are lost or reordered.
A send timer is the reciprocal solution to using a receive timer.
While it requires a new timestamp field to be maintained, it clearly
detects send pauses and corrects the problem presented by HTTP/1.1.
However, as with the receive timer, it cannot detect bursts that
could happen before a full RTO.
An alternative method examines the unused portion of the congestion
window to determine if the capacity to burst exists. This method is
simple, it uses existing information to make its decision, and it
solves both the HTTP/1.1. problem as well as the RTO problem. In
addition, it addresses the problem that needs to be solved (bursts)
instead of a specific circumstance where the problem could happen
(send pauses). However, where timer detection avoids defining a
burst (it defines idle periods instead), here a burst must be defined
before it can be detected. One possible definition is the situation
where the available portion of the sending window is some proportion
of the entire congestion window, say 50%. Another definition places
a numerical limit on the available portion of the congestion window,
say 4 or CWND-1 packets.
Once a burst is detected, there are several different ways to take
action. The different possibilities are listed below, again from
least to most radical.
Reducing the congestion window to one packet and re-entering slow-
start, the original slow-start restart is one response. This was the
solution proposed by J&K. This is a very conservative response and
it defeats most of the speedup that HTTP/1.1 provides [HOT97].
Current proposals [FAP97] have suggested increasing the initial
window from 1 packet to 4 packets. Further, depending on the method
of burst detection, Full Restart can be far more punitive than it
should be. Coupled with a timer, full restart is most likely to
respond to a completely empty congestion window. Coupled with Packet
Counting, the response could close the window too far, even smaller
than the amount of outstanding data.
This is a modified version of Full Restart which solves the problem
created by using Packet Counting to detect bursts. With this type of
response, the congestion window is reduced to the amount of
outstanding data plus the slow-start initial window (1, 2, or 4). It
works exactly like Full Restart in the idle case, but is successful
at controlling bursts in an active connection. Further, in an active
connection, it effectively implements a leaky bucket of the initial
window size for the accumulation of send opportunity based on the
receipt of acks. This solution is fairly conservative, especially as
it defaults to Full Restart, but more importantly, sending
opportunity is simply lost if not used, and is not available for
paced output. Also, it forces negative congestion feedback on the
Burst Size Limitation:
When a burst is detected, its effects are limited, the sender may not
send any more than a preset number of packets into the network. It
is less conservative than the first two responses in that it does not
affect the size of the congestion window, and it is simple to
implement, simply count up the number of packets you can send and
stop when you reach the limit. Whether to wait for an ack or some
other signal to resume sending is an implementation detail. Lastly,
this burst response can be performed after each ack or with each
send. The behavior is slightly different in each case.
When a burst is detected, packets are dribbled into the network until
the sender starts receiving acks and normal maintenance can be
resumed [VH97]. This solution is very easy on the network and scales
well in cases of high bw/delay. However, it requires a new timer and
parameter tuning require more research.
Now we will examine combinations of the different detection and
response methods presented above. Each of the solutions that below
have been implemented in some form.
BSD Implementation (Jacobson and Karels)
The most common implementation uses a receive timer coupled with Full
Restart. This is the implementation that causes the interaction
problems with HTTP/1.1. The obvious alternative is to implement a
send timer as originally intended and use Full Restart. There are
several drawbacks to this solution. First, a send timer adds
additional state and serves no purpose other than to correct the
bursting behavior after send pauses. Second, forcing a slow-start in
this situation is problematic for HTTP/1.1. A slow-start for each
new user request adds a delay burden to characteristically small HTTP
responses. Further, the HTTP user request pattern is unpredictable.
It is possible for the user to make a new request before the send
timer expires, triggering a burst that would defeat such a timer.
Maximum Burst Limitation (Floyd)
Floyd has proposed a coupling of Packet Counting with Burst Size
Limitation. This solution has been implemented in ns and it prevents
the sender from transmitting a series of back-to-back packets larger
than the user configured burst limit (suggested to be 4 packets)
[NS97]. There are several issues involved with recovering from a
burst and the ns implementation doesn't address them consistently.
First, it is not clear when the sender is allowed to send again after
sending the the first limited burst of packets. One implementation
requires the sender to wait for the burst timer to expire. Another
seems to allow a series of short bursts. Another issue is how the
simulation implementation and usage translates to a live network
situation. The implementation of this solution can range from simple
to more complex.
Congestion Window Monitoring (Hughes, Touch, and Heidemann)
Our proposed solution combines Packet Counting with Window Limiting.
Whenever (CWND - outstanding data > 4), we reduce CWND to
(outstanding data + 4). The choice of 4 packets is discussed in with
the implementation details below. Congestion Window Monitoring (CWM)
allows the congestion window to grow normally but shrinks the
congestion window as the sender becomes idle. It also prevents the
sender from transmitting any bursts larger than 4 packets in response
to a new request. Because CWM is not dependent on any timers, the
loss of an ack or a nearly idle connection cannot cause any bursts.
CWM is similar to Burst Limitation, but avoids the burst by reducing
CWND, rather than by inhibiting the sends directly. As a result, we
avoid the potential problem of sequential calls to TCP_output, which
would cause bursts in the former, but not the latter. CWM also
causes TCP to use the feedback of 'not using the CWND fast enough',
which results in a decrease in the CWND.
CWM effectively imposes a leaky bucket type limitation on the
congestion window. The window is allowed to grow and be managed
normally but the sender is not allowed to save up any sending
opportunities. Any opportunity that is not used is lost. This
property of CWM forces interleaved reception of acks and processing
Rate Based Pacing (Visweswaraiah and Heidemann)
Rate Based Pacing combines the Pacing response with either a Send
Timer or Packet Counting. It avoids slow-start when resuming after
sending pauses and allows the normal clocking of packets to be
gracefully restarted. When a burst potential is detected, the
algorithm meters a small burst of packets into the channel [VH97].
RBP is the least conservative solution to the bursting problem
because it continues to make use of the pre-pause congestion window.
If network conditions have changed significantly, maintaining the
previous window could cause the paced connection to be overly
aggressive as compared to other connections. (Although some work
suggests congestion windows are stable over multi-minute timeframes
[BSSK97].) More recently pacing been suggested for use in wireless
networking scenarios [BPK97], and for satellite connections.
Packet traces of the current FreeBSD implementation of SSR (using the
receive timer), of a modified version of FreeBSD using a send timer,
and of CWM with HTTP/1.1 support the above observations. In all of
the traces, the response pattern for the first request is the same
with each method. This shows that CWM allows the congestion window
to grow normally. Because of the different actions taken by the
three algorithms, the response pattern for the second request differs
as would be expected. [We have graphs available upon request]
When the second request arrives at the server after the
retransmission timeout (RTO), normal FreeBSD allows the server to
respond with a burst of packets. FreeBSD using a send timer responds
by entering slow-start. CWM allows a 4 packet burst. When the second
request arrives at the server before the RTO, both timer
implementations allow a burst. CWM again limits the burst to 4
packets. Note, RTO is the common timer limit, but any value would
have the same results, depending on when the second request was
presented in relation to the timer.
Implementation of Congestion Window Monitoring
Congestion Window Monitoring requires a simple modification to
existing TCP output routines. The changes required replace the
current idle detection code. Replace the existing 3 lines of code:
idle = (snd_max == snd_una)
if (idle && now - lastrcv >= rto)
cwnd = 1;
with the following 3 lines of code:
maxwin = 4 + snd_nxt - snd_una;
if (cwnd > maxwin)
cwnd = maxwin;
Packet counting is implemented by line 1. Lines 2 and 3 implement
The choice of limiting the available congestion window to 4 packets
is based on the normal operation of TCP. An ACK received by the
sender may be in response to the receipt of 2 packets, allowing
another 2 to be sent. Further, normal window growth may require the
sending of a third packet. Lastly, in slow-start with delayed ACKs,
the receipt of an ACK can trigger the sending of 4 packets. Thus, 4
packets is a reasonable burst to send into the network.
Increasing the initial window in slow-start to 4 packets has already
been proposed [FAP97]. The effects of this change have been explored
in simulation in [PN98] and in practice in [AHO97]. Such a
modification to TCP would cause the same behavior as our solution in
the cases where the pause timer has expired. It does not address the
pre-timeout bursting situation we are concerned with.
At this time, we propose CWM as a simple, minimal and effective fix
to the 'bug' in current TCP implementations that is exploited by
HTTP/1.1. Modifications can be made to TCP to solve the slow-start
restart problem that are consistent with the original congestion
avoidance specifications (i.e. a send timer). However, we feel that
the original intended behavior is not appropriate to some current
applications, specifically HTTP. Thus, we recommend Congestion Window
Monitoring to prevent bursts into the network. Not only does this
solution solve the current problem in a simple way, it will prevent
bursting in any other situation that might arise. The 4 packet bursts
which we allow are consistent with congestion window growth
algorithms and with Floyd's conclusion about increasing the initial
CWM, as well as the other solutions listed, need to be re-evaluated
within emerging TCP implementations, e.g., SACK [JB88]. In general,
TCP has no rate pacing and uses congestion control to avoid bursts in
current implementations. A more explicit mechanism, such as RBP or
similar proposals may be desirable in the future.
CWM presents no security problems.
[AHO97] Mark Allman, Chris Hayes, and Shawn Ostermann. An Evaluatin
of TCP Slow Start Modifications, July 1997. (Submitted to CCR,
draft available from http://jarok.cs.ohiou.edu/papers/)
[BPK97] Hari Balakrishnan, Venkata N. Padmanabhan, and Randy H. Katz.
The Effects of Asymmetry on TCP Performance. In Proceedings of
the ACM/IEEE Mobicom, Budapest, Hungary, ACM. September, 1997.
[BSSK97] Hari Balakrishnan, Srinivasan Seshan, Mark Stemm, and Randy
H. Katz. Analyzing Stability in Wide-Area Network Performance.
In Proceedings of the ACM SIGMETRICS, Seattle WA, USA, ACM.
[FGMFB97] R. Fielding, Jim Gettys, Jeffrey C. Mogul, H. Frystyk, and
Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, January
1997. RFC 2068.
[FAP97] Sally Floyd, Mark Allman, and Craig Partridge. Increasing
TCP's Initial Window, July 1997. Internet Draft draft-floyd-
[Hei97] John Heidemann. Performance Interactions Between P-HTTP and
TCP Implementations. ACM Computer Communications Review, 27(2),
65-73, April 1997.
[HOT97] John Heidemann, Katia Obraczka, and Joe Touch. Modeling the
Performance of HTTP Over Several Transport Protocols. ACM/IEEE
Transactions on Networking 5(5), 616-630, October, 1997.
[JB88] Van Jacobson and R.T. Braden. TCP extensions for long-delay
paths, October 1988. RFC 1072.
[JK90] Van Jacobson and Michael J. Karels. Congestion Avoidance and
Control. ACM Computer Communication Review, 18(4):314-329,
August 1990. Revised version of his SIGCOMM '88 paper.
[NS97] ns Network Simulator. http://www-mash.cs.berkeley.edu/ns/,
[PN98] K. Poduri and K. Nichols. Simulation Studies of Increased
Initial TCP Window Size, February 1998. Internet Draft draft-
[Poo97] Kacheong Poon, Sun Microsystems, tcp-implementors mailing
list, August, 1997.
[Tou97] Joe Touch, ISI, tcp-implementors mailing list, August 12,
[VH97] Vikram Visweswaraiah and John Heidemann. Improving Restart of
Idle TCP Connections. Technical Report 97-661, University of
Southern California, November 1997.
Amy Hughes, Joe Touch, John Hiedemann
University of Southern California/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292-6695
Phone: +1 310-822-1511
Fax: +1 310-823-6714