Network Working Group M. Scharf
Internet-Draft University of Stuttgart
Intended status: Experimental S. Floyd
Expires: August 30, 2007 ICIR
P. Sarolahti
Nokia Research Center
February 26, 2007
Avoiding Interactions of Quick-Start TCP and Flow Control
draft-scharf-tsvwg-quick-start-flow-control-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 30, 2007.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Abstract
This document describes methods to avoid interactions between the
flow control of the Transmission Control Protocol (TCP) and the
Quick-Start TCP extension. Quick-Start is an optional TCP congestion
control mechanism that allows hosts to determine an allowed sending
rate from feedback of routers along the path. With Quick-Start, data
Scharf, et al. Expires August 30, 2007 [Page 1]
Internet-Draft Quick-Start TCP and Flow Control February 2007
transfers can start with a potentially large congestion window. In
order to fully utilize the data rate determined by Quick-Start, the
sending host must not be limited by the TCP flow control, i. e., the
amount of free buffer space advertised by the receive window.
There are two potential interactions between Quick-Start and the TCP
flow control: First, receivers might not provide sufficiently large
buffer space after connection setup, or they may implement buffer
allocation strategies that implicitly assume the slow-start behavior
on the sender side. This document therefore provides guidelines for
buffer allocation in hosts supporting the Quick-Start extension.
Second, the TCP receive window scaling mechanism interferes with
Quick-Start when being used in the initial three-way handshake
connection setup. This document describes a simple solution to
overcome this problem.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Requirements notation . . . . . . . . . . . . . . . . . . . . 4
3. Quick-Start TCP and receive buffer dimensioning . . . . . . . 4
3.1. Receiver buffer allocation strategies . . . . . . . . . . 4
3.2. Recommendations for buffer dimensioning in presence of
Quick-Start requests . . . . . . . . . . . . . . . . . . . 4
4. Quick-Start TCP and receive window scaling . . . . . . . . . . 5
4.1. Receive window scaling . . . . . . . . . . . . . . . . . . 5
4.2. Problem within the three-way handshake . . . . . . . . . . 5
4.3. Possible remedy . . . . . . . . . . . . . . . . . . . . . 6
4.4. Discussion and deployment considerations . . . . . . . . . 8
5. Security Considerations . . . . . . . . . . . . . . . . . . . 8
6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 9
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Normative References . . . . . . . . . . . . . . . . . . . 9
8.2. Informative References . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
Intellectual Property and Copyright Statements . . . . . . . . . . 12
Scharf, et al. Expires August 30, 2007 [Page 2]
Internet-Draft Quick-Start TCP and Flow Control February 2007
1. Introduction
Quick-Start is an experimental extension for the Transmission Control
Protocol (TCP) [RFC0793] that allows to speed up best effort data
transfers. The Quick-Start TCP extension is specified in [RFC4782].
With Quick-Start, TCP hosts can request permission from the routers
along a network path to send at a higher rate than allowed by the
default TCP congestion control, in particular after connection setup
or longer idle periods. The explicit router feedback avoids the
time-consuming capacity probing by the TCP slow-start and can
significantly improve transfer times over paths with a high
bandwidth-delay product [SAF07].
The usage of Quick-Start significantly changes the TCP behavior
during connection setup. This is why special care is needed in order
to prevent interactions between Quick-Start and other TCP mechanisms.
Specifically, TCP flow control mechanisms have to be optimized for
the usage of Quick-Start, in particular when the TCP connection spans
a path with a large bandwidth-delay product (BDP). In such cases the
sending window should have a large value in order to achieve good TCP
performance (see [RFC2488],[RFC3481]).
Unlike the standard slow-start mechanism, the Quick-Start TCP
extension allows the sender to use large congestion windows
immediately after connection setup. The usage of such large windows
raises two questions: First, what receiver buffer allocation
strategies should be used in combination with Quick-Start? And
second, how to appropriately signal these large windows? This
document addresses these issues and shows that Quick-Start requires
special mechanisms in both cases. The document thereby supplements
the Quick-Start TCP specification [RFC4782], where flow control
issues have not been addressed in detail.
The rest of this document is structured as follows: First, the
question of receive buffer allocation in combination with Quick-Start
is addressed and dimensioning guidelines are provided. Second, a
modification of the receive window scaling mechanism [RFC1323] is
specified, which is required to fully benefit from Quick-Start when
the Quick-Start request is used in the initial <SYN> segment.
It should be noted that the effects and most methods discussed in
this document are not specific to the Quick-Start TCP extension.
They could also be used in combination with other proposals that
cause a behavior more aggressive than standard TCP slow-start, for
instance [LAJ+07].
Scharf, et al. Expires August 30, 2007 [Page 3]
Internet-Draft Quick-Start TCP and Flow Control February 2007
2. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Quick-Start TCP and receive buffer dimensioning
3.1. Receiver buffer allocation strategies
The TCP sending window results from the minimum of the congestion
window and the receive window (also called advertised receiver
window) [RFC2581]. A small receive window prevents the TCP
connection from fully utilizing paths with a larger bandwidth-delay
product. As a consequence, on the one hand, a TCP receiver should
advertise a receive window that is big enough to allow an efficient
utilization of the connection path. On the other hand, hosts with a
potentially high number of TCP connections need to optimize the
buffer and memory usage to be able to serve a maximum possible number
of TCP connections. Finding a fixed receive buffer size that is
optimal between these two goals is difficult.
This is why many modern TCP implementations use an intelligent
dynamic buffer management. There are different auto-tuning
techniques and heuristics [Dun06] designed to prevent the receive
window from limiting the data rate at the sender. An implementation
using buffer size auto-tuning is described for instance in [SB05]. A
common characteristic of most of these buffer allocation strategies
is that they initially start with a rather small receive window. The
more data arrives, the more buffer is allocated to the corresponding
connection. This behavior is reasonable if the sender uses the
standard slow-start algorithm and thus starts with a small congestion
window anyway. However, when using Quick-Start, a large receive
buffer may be required immediately after connection setup.
3.2. Recommendations for buffer dimensioning in presence of Quick-Start
requests
When a host receives and approves a Quick-Start request, in
particular during the connection setup, it SHOULD allocate a
"reasonable" amount of buffer space so that a potential Quick-Start
data transfer can start with a high sending window. If buffer size
auto-tuning is used, it SHOULD be ensured that a sufficiently high
initial receive window is announced. The handling of buffer space
upon arrival of a Quick-Start request SHOULD be configurable by the
corresponding application.
Scharf, et al. Expires August 30, 2007 [Page 4]
Internet-Draft Quick-Start TCP and Flow Control February 2007
Determining an appropriate "reasonable" receive buffer size is not a
trivial task and also depends on the available system resources.
However, unlike standard TCP slow-start, the Quick-Start extension
provides some additional information that could help to properly
dimension the receive buffer. A reasonable buffer size would
typically be a small multiple of the bandwidth-delay product of the
path. An approximation of the available bandwidth can be directly
obtained from the approved Quick-Start rate in the received request.
If the round-trip time (RTT) to the Quick-Start originator is also
known (e. g., if it has been cached from previous connections), a
reasonable buffer size can be directly calculated as a small multiple
of the BDP. In case that the round-trip time is not known, the
buffer dimension could be done for a configurable "worst-case" RTT
such as 500 ms.
4. Quick-Start TCP and receive window scaling
4.1. Receive window scaling
The TCP header specified in [RFC0793] uses a 16 bit field to report
the receive window size to the sender. This effectively limits the
sending window to 65 kB. To circumvent this problem, the "Window
Scale" TCP extension [RFC1323] defines an implicit scale factor,
which is used to multiply the window size value found in a TCP header
to obtain a 32 bit window size. If enabled, the scale factor is
announced during connection setup by the "Window Scale" TCP option in
<SYN> and <SYN,ACK> segments.
In general, using receive window scaling is highly beneficial for TCP
connections over path with a large bandwidth-delay product
[RFC2488],[RFC3481]. Otherwise, the path capacity cannot fully be
utilized by TCP. Quick-Start TCP can significantly speed up data
transfers over such paths [RFC4782],[SAF07]. As a consequence, a
host supporting Quick-Start SHOULD enable receive window scaling. If
Quick-Start is used in the initial three-way handshake, the minimum
required scaling factor can be obtained from the required receive
buffer space, which can be approximated as described in the previous
section.
4.2. Problem within the three-way handshake
A problem arises when the Quick-Start mechanism is used within the
three-way handshake, and the Quick-Start request is added to the
initial <SYN> segment: In this scenario, if the Quick-Start request
is approved by the routers along the path, the receiver echoes back
the Quick-Start response in the <SYN,ACK> segment. This process is
illustrated in [RFC4782]. Upon reception of the <SYN,ACK> with the
Scharf, et al. Expires August 30, 2007 [Page 5]
Internet-Draft Quick-Start TCP and Flow Control February 2007
Quick-Start response, the sender can set the congestion window to the
determined value so that it can immediately start to send with the
approved data rate.
However, [RFC1323] defines that the "Window field in a SYN (i.e., a
<SYN> or <SYN,ACK>) segment itself is never scaled." This means that
the maximum receive window that can be signaled to the sender in the
<SYN,ACK> is 65 kB. As a consequence, the TCP flow control will
prevent the TCP sender from having more than 65 kB of outstanding
data, even if the receiver has much more free buffer, and the Quick-
Start feedback allows a much larger congestion window.
This effect essentially limits the maximum amount of data sent by
Quick-Start to 65 kB, when the sender sends the Quick-Start request
in the initial <SYN> segment. Also, the congestion window after
quiting the Quick-Start rate pacing phase is at most 65 kB, as the
congestion window is set to the amount of outstanding data at this
point. This is an undesirable restriction for the Quick-Start
mechanism, even if 65 kB is still much more than the initial
congestion window in slow-start that is allowed by [RFC3390].
This issue only occurs when Quick-Start is used in the three-way TCP
connection setup procedure, and only in the direction of the client
(connection originator) to the server. Still, this case is one of
the planned usage scenarios for the Quick-Start TCP extension.
4.3. Possible remedy
The limitation imposed by the window scaling could be addressed in
two different ways: First, one could deviate from [RFC1323] and use a
scaled receive window in <SYN> and <SYN,ACK> segments, if they
include Quick-Start options. This would avoid the problem sketched
in the previous section, but it is not compliant with the TCP
specification and the currently deployed TCP implementations.
This document describes a second, standard-compliant method: When a
host receives a <SYN> segment with a Quick-Start option, it processes
the option as described in [RFC4782]. Provided that the host has
Quick-Start support enabled, the Quick-Start response is echoed back
in the <SYN,ACK> segment. As explained, this segment cannot announce
receive windows larger than 65 kB. If the receiver allocates a
buffer space larger than 65 kB, an additional empty segment (without
<SYN> flag) SHOULD be sent after the <SYN,ACK> segment, in order to
announce the true receive window. The resulting message flow is
depicted in Figure 1.
Scharf, et al. Expires August 30, 2007 [Page 6]
Internet-Draft Quick-Start TCP and Flow Control February 2007
Sender Routers (approving QS request) Receiver
------ ------- --------
| |
| ------------------------------------------------>|
| QS request |
| TCP <SYN>, unscaled receive window |
| window scaling and other options |
| |
| <------------------------------------------------|
| QS response |
| TCP <SYN,ACK>, unscaled receive window |
| window scaling and other options |
| |
| <------------------------------------------------|
| Additional acknowledgment |
| TCP <ACK>, scaled receive window |
| |
| ------------------------------------------------>|
| QS report |
| TCP <ACK> |
| |
| ================================================>|
| ================================================>|
| Rate paced data transfer |
| |
| <------------------------------------------------|
| First new acknowledgment |
V V
Figure 1: Message sequence chart of the proposed mechanism
After having received this additional acknowledgment, the sender is
aware of the true available receive buffer. Provided that the Quick-
Start request is approved on the path and that the receive window is
sufficiently large, this allows the sender to send more than 65 kB
during the Quick-Start rate pacing phase.
Note that there is some degree of freedom as to when to send the
additional acknowledgment. It can be sent immediately after the
<SYN,ACK> segment, but this is not required in all cases. It is
sufficient if the sender receives this segment before reaching the
limit of the unscaled receive window. As a consequence, receivers
may decide to delay the sending of this segment for some small amount
of time.
Scharf, et al. Expires August 30, 2007 [Page 7]
Internet-Draft Quick-Start TCP and Flow Control February 2007
4.4. Discussion and deployment considerations
The method proposed in this document is compliant with the TCP
specifications: Sending empty segments to increase the receive window
is implicitly allowed by [RFC0793], and in [RFC2581] it is clearly
stated that sending an acknowledgment is allowed to update the
receive window. Implementing the method thus should require changes
in the receiver TCP implementation only.
However, sending an empty acknowledgment shortly after a <SYN,ACK>
segment is an atypical TCP communication event. The <SYN,ACK> and
the additional segment could get reordered in the network. In this
case, the sending host will typically ignore the additional segment,
as it is still awaiting the <SYN,ACK>. Furthermore, middleboxes such
as state-full firewalls might drop the additional acknowledgment.
Even worse, this segment might also be dropped if a middlebox
receives it earlier than the <ACK> segment from the sender. At this
point in time, from the viewpoint of the middlebox, the bi-
directional end-to-end TCP connection is not yet established. If the
additional segment gets dropped, the sender gets informed about the
unscaled receive window when the next new acknowledgment arrives,
which may limit the benefit of Quick-Start. Delaying the additional
acknowledgment for a short period of time could help to avoid such
problems. Further investigation is needed to analyze whether such a
delay is required.
A possible alternative to the message flow in Figure 1 would be to
piggyback the Quick-Start response on the additional acknowledgment
segment instead of the <SYN,ACK>. However, this approach has several
drawbacks and is therefore not recommended: First, the Quick-Start
response would be received later, which could cause additional
delays. Second, the <SYN,ACK> is immediately acknowledged by the
<ACK> segment. The Quick-Start rate report can thus be piggybacked
on this <ACK>. In contrast, if the Quick-Start response is included
in the additional acknowledgment, the Quick-Start report has to be
piggybacked to a data segment, i. e., it depends on the availability
of application data whether and when the Quick-Start report is sent.
It must be emphasized that the additional segment mandated by this
document results in a certain network overhead. Given the fact that
Quick-Start requests will be approved over under-utilized paths only,
this overhead might not be a significant problem.
5. Security Considerations
Quick-Start TCP imposes a number of security challenges. Known
security threats as well as counter-measures are discussed in the
Scharf, et al. Expires August 30, 2007 [Page 8]
Internet-Draft Quick-Start TCP and Flow Control February 2007
section "Security Considerations" of [RFC4782]. Since this document
describes extensions to Quick-Start TCP, the security issues
identified in [RFC4782] apply here, too.
Sending an additional acknowledgment segment is an allowed behavior
for a TCP connection endpoint and does not result in additional
security threats. However, special care is needed when allocating
large amounts of buffer space to newly established TCP connections,
since this could create vulnerabilities to denial-of-service attacks.
This issue may not be critical if Quick-Start is used in controlled
environments only, as recommended by [RFC4782].
6. IANA considerations
This document has no actions for IANA.
7. Acknowledgments
The first author thanks Haiko Strotbek, Martin Koehn, Simon Hauger,
and Christian Mueller for contributing to this document.
8. References
8.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981.
[RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
for High Performance", RFC 1323, May 1992.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999.
[RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
Initial Window", RFC 3390, October 2002.
[RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick-
Start for TCP and IP", RFC 4782, January 2007.
Scharf, et al. Expires August 30, 2007 [Page 9]
Internet-Draft Quick-Start TCP and Flow Control February 2007
8.2. Informative References
[Dun06] Dunigan, T., "TCP auto-tuning zoo", available
at http://www.csm.ornl.gov/~dunigan/net100/auto.html,
February 2006.
[LAJ+07] Liu, D., Allman, M., Jin, S., and L. Wang, "Congestion
Control Without a Startup Phase", PFLDnet2007, Marina Del
Rey, CA, USA, February 2007.
[RFC2488] Allman, M., Glover, D., and L. Sanchez, "Enhancing TCP
Over Satellite Channels using Standard Mechanisms",
BCP 28, RFC 2488, January 1999.
[RFC3481] Inamura, H., Montenegro, G., Ludwig, R., Gurtov, A., and
F. Khafizov, "TCP over Second (2.5G) and Third (3G)
Generation Wireless Networks", BCP 71, RFC 3481,
February 2003.
[SAF07] Sarolahti, P., Allman, M., and S. Floyd, "Determining an
Appropriate Sending Rate Over an Underutilized Network
Path", accepted for publication in Computer Networks,
2007.
[SB05] Smith, M. and S. Bishop, "Flow Control in the Linux
Network Stack", available
at http://www.cl.cam.ac.uk/~pes20/Netsem/linuxnet.pdf,
February 2005.
Authors' Addresses
Michael Scharf
University of Stuttgart
Pfaffenwaldring 47
D-70569 Stuttgart
Germany
Phone: +49 711 685 69006
Email: michael.scharf@ikr.uni-stuttgart.de
URI: http://www.ikr.uni-stuttgart.de/en/~scharf
Scharf, et al. Expires August 30, 2007 [Page 10]
Internet-Draft Quick-Start TCP and Flow Control February 2007
Sally Floyd
ICIR (ICSI Center for Internet Research)
Phone: +1 (510) 666-2989
Email: floyd@icir.org
URI: http://www.icir.org/floyd/
Pasi Sarolahti
Nokia Research Center
P.O. Box 407
FI-00045 NOKIA GROUP
Finland
Phone: +358 50 4876607
Email: pasi.sarolahti@iki.fi
Scharf, et al. Expires August 30, 2007 [Page 11]
Internet-Draft Quick-Start TCP and Flow Control February 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Scharf, et al. Expires August 30, 2007 [Page 12]