Internet Congestion Control M. Mathis
Research Group Pittsburgh Supercomputing Center
Internet-Draft March 4, 2009
Intended status: Informational
Expires: September 5, 2009
Relentless Congestion Control
draft-mathis-iccrg-relentless-tcp-00
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 5, 2009.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Abstract
Relentless congestion control is a simple modification that can be
applied to almost any AIMD style congestion control: instead of
Mathis Expires September 5, 2009 [Page 1]
Internet-Draft Relentless Congestion Control March 2009
applying a multiplicative reduction to cwnd after a loss, cwnd is
reduced by the number of lost segments. It can be modeled as a
strict implementation of van Jacobson's Packet Conservation
Principle. During recovery, new segments are injected into the
network in exact accordance with the segments that are reported to
have been delivered to the receiver by the returning ACKs.
This algorithm offers a valuable new congestion control property: the
TCP portion of the control loop has exactly unity gain, which should
make it easier to implement simple controllers in network devices to
accurately control queue sizes across a huge range of scales.
Relentless Congestion Control conforms to neither the details nor the
philosophy of current congestion control standards. These standards
are based on the idea that the Internet can attain sufficient
fairness by having relatively simple network devices send uniform
congestion signals to all flows, and mandating that all protocols
have equivalent responses to these congestion signals.
To function appropriately in a shared environment, Relentless
Congestion Control requires that the network allocates capacity
through some technique such as Fair Queuing, Approximate Fair
Dropping, etc. The salient features of these algorithms are that
they segregate the traffic into distinct flows, and send different
congestion signals to each flow. This alternative congestion control
paradigm is described in a separate document, also under
consideration by the ICCRG.
The goal of the document is to illustrate some new protocol features
and properties might be possible if we relax the "TCP-friendly"
mandate. A secondary goal of Relentless TCP is to make a distinction
between the bottlenecks that belong to protocol itself, vs standard
congestion control and the "TCP-friendly" paradigm.
Mathis Expires September 5, 2009 [Page 2]
Internet-Draft Relentless Congestion Control March 2009
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Implementation . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Queue Controllers . . . . . . . . . . . . . . . . . . . . . . 6
4. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. Relentless is not at all AIMD-friendly . . . . . . . . . . 7
4.2. Performance Model . . . . . . . . . . . . . . . . . . . . 7
4.3. Long Fat Networks (LFN) . . . . . . . . . . . . . . . . . 7
5. Usage Example: remote video upload . . . . . . . . . . . . . . 8
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 10
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10
Mathis Expires September 5, 2009 [Page 3]
Internet-Draft Relentless Congestion Control March 2009
1. Introduction
Relentless congestion control is a simple modification that can be
applied to almost any AIMD style congestion control: instead of
applying a multiplicative reduction to cwnd after a loss, cwnd is
reduced by the number of lost segments. The resulting cwnd is
normally the same as the actual window of data that was successfully
delivered during the lossy RTT. If the application is keeping up
with the network and other traffic does not change, this is the
correct cwnd to exactly fill the network on subsequent RTTs.
Relentless Congestion Control conforms to neither the details nor the
philosophy of current congestion control standards [RFC2914]. These
standards are based on the idea that the Internet can attain
sufficient fairness by having relatively simple network devices send
uniform congestion signals to all flows, and mandating that all
protocols have equivalent responses to these congestion signals.
To function appropriately in a shared environment, Relentless
Congestion Control requires that the network allocates capacity
through some technique such as Fair Queuing [FQ], Approximate Fair
Dropping [AFD], etc. The salient features of these algorithms are
that they segregate the traffic into distinct flows, and send
different congestion signals (segment losses, ECN marks or Queuing
delay) to each flow such that the network, and not the end-system,
controls capacity allocation. This alternative congestion control
paradigm is described in a separate document [unfriendly], also under
consideration by the ICCRG.
Relentless Congestion Control offers a valuable new property: the TCP
portion of the control loop has exactly unity gain. This property is
ideal from the standpoint of the queue controllers embedded within
network devices -- if some flow is using 1% too much bandwidth, then
dropping 1% of its segments will bring it back to the proper data
rate one RTT later. Likewise if a flow is occupying 10 segments too
much queue space, then dropping 10 segments will bring it back to the
proper queue occupancy on the next RTT. This unity gain property
should make it easier to implement simple queue controllers in
network devices that accurately control queues across a huge range of
scales.
Note the duality here, Relentless Congestion Control both requires
that the network manages traffic while at the same time, it makes it
easier to do so.
We do not claim that Relentless Congestion Control is ideal, or
without its own set of problems. We are not even planning to
standardize it at this time. The goal of the document is to
Mathis Expires September 5, 2009 [Page 4]
Internet-Draft Relentless Congestion Control March 2009
illustrate what new protocol features and properties might be
possible if we relax the "TCP-friendly" mandate.
In some data intensive communities, TCP has the reputation of being
too slow to be useful. A secondary goal of Relentless TCP is to make
a distinction between the bottlenecks that belong to protocol itself,
vs standard congestion control and the "TCP-friendly" paradigm.
Please send commentary on this draft to the ICCRG mailing list:
iccrg@cs.ucl.ac.uk.
2. Implementation
The simplest mental model of Relentless Congestion Control is a
strict implementation of van Jacobson's Packet Conservation Principle
[Van88]. During recovery, new segments are injected into the network
in exact accordance with the segments that are reported to have been
delivered to the receiver by the returning ACKs. As long as the
network does not lose all segments or ACKs, and the SACK scoreboard
[RFC2018] chooses appropriate data to retransmit, this will clearly
converge in one round trip time to exactly the window successfully
delivered during the lossy round trip. Since this quantity of data
was successfully delivered during a lossy round trip it is both the
largest justifiable estimate as well as the most accurate estimate of
the maximum capacity of the network, and a good starting point for
subsequent RTTs. Algorithms that open the window, Slow-Start and
Congestion Avoidance, are only permitted when the receiver is
reporting contiguous data, which implies that there has subsequently
been a full RTT without any additional losses.
Our implementation is not quite as elegant, and consequently is not
as accurate as the mental model. It is based on an existing TCP
implementation that supports plugable congestion control modules.
Unfortunately, some of the cwnd adjustments are wired into the loss
recovery code and can not be suppressed directly. They can only be
offset later in the plugable Relentless Congestion Control module.
For details and source code, see the web site [DOWNLOAD].
As a network safety and possible deployment strategy, our
implementation requires that the traffic be marked with the "Lower
Effort" (LE) DSCP [RFC3662]. In portions of the Internet where LE is
supported, this permits Relentless TCP to co-exist with standard AIMD
TCP using default Best Effort (BE) service, without significantly
perturbing the latter.
There may be concerns about the consequences of publishing our code.
However, any competent kernel programmer can trivially reconstruct
Mathis Expires September 5, 2009 [Page 5]
Internet-Draft Relentless Congestion Control March 2009
our work from this document. The Relentless algorithm is
substantially simpler than the carefully tweaked algorithms that it
replaces. The majority of the complexity of our implementation comes
from preserving compatibility with other plugable congestion control
algorithms.
3. Queue Controllers
There has been a lot research into how to optimally manage queues,
starting with Sally Floyd's paper on Random Early Detection [RED].
None of the proposed solutions is entirely satisfactory. The
difficulty is that to optimally control traffic from AIMD congestion
control requires that the queue controller estimate some parameters
of the traffic, such as effective RTT or cwnd. These parameters are
needed to estimate the drop rate required to attain a specific data
rate. Furthermore, in high speed devices the calculations have to
run at line rate in silicon. So far all known solutions are, at
best, engineering compromises, and are suboptimal in environments
that are significantly different than their design point.
We claim that simplifying TCP's response to congestion will make it
much easier to implement simple lightweight queue controllers that
can accurately manage queue lengths across a huge range of network
scales.
For example we define a simple "Baseline Controller" to manage the
length of a queue. It would monitor the minimum queue length, and
periodically randomly discard or ECN mark as many packets as the
minimum queue length was above the set point during the previous
interval. This keeps bringing the minimum queue length down to the
set point. The periodic queue management interval can be any
convenient interval somewhat longer than one RTT. (e.g. 1 to 10 times
the RTT). Although the accuracy of this controller improves when
it's polling interval is better matched to the actual RTT, it clearly
is not very sensitive to mismatch. When the polling interval is 10
times the RTT, its average error would only be 5 packets, as long as
the flow was Relentless TCP.
4. Properties
Note that Relentless TCP has not been exhaustively tested yet. This
section reflects early testing and analysis of the algorithm. Our
results will improve.
Mathis Expires September 5, 2009 [Page 6]
Internet-Draft Relentless Congestion Control March 2009
4.1. Relentless is not at all AIMD-friendly
Not too surprisingly, Relentless TCP does not share the network very
well with AIMD TCP if both see the same average loss rate because
there is no mechanism to isolate the flows.
4.2. Performance Model
The traditional 1/sqrt(p) TCP performance model [MSMO97] will not
apply. A simple analysis suggests that in steady state the
Relentless TCP data rate will be something like MSS/(3*RTT*p). In an
absolutely quiet network with drop tail queues you expect there to be
one loss per three round trips: when the bottleneck queue is full and
congestion avoidance increments the window, it will take one RTT
before the sender observes the subsequent loss, one RTT to repair the
loss, and one RTT of congestion avoidance to reach the next cwnd
increment. In real networks the losses are observed to be much more
bursty: clusters of losses followed by multiple loss-less RTTs.
Our implementation is further hampered by the need to offset cwnd
adjustments that occur in the mainline TCP code as a side effect of
processing partial ACKs and and other events. We overcome these
adjustments in the plugable congestion control module by setting
ssthresh to the old cwnd minus losses, thus there is often an
interval of slowstart following recovery. We strive to understand
and further eliminate these situations where the mainline code
differs from the ideal packet conservation principle.
4.3. Long Fat Networks (LFN)
In our moderately large window tests (a Gigabit network with a 70 ms
RTT) we were able to routinely attain better than 500 Mb/s even
though the loss rate was in excess of 0.07%. We know from the
performance model that attaining this rate with standard AIMD
congestion control requires a loss rate that is less than 0.00003%.
Clearly Relentless TCP can tolerate many orders of magnitude higher
loss rates than other TCPs.
However the packet traces reveal several anomalies not directly
related to Relentless Congestion Control, including spurious TCP
retransmissions, intermittent massive packet reordering, and various
unexpected limitations on data transmissions, when there seems to be
plenty of available window. This raises an interesting side issue:
Relentless TCP may exercise the SACK and recovery code several order
of magnitude harder than other congestion control algorithms. We may
find that we need to test this code much more thoroughly than it has
been needed in the past.
Mathis Expires September 5, 2009 [Page 7]
Internet-Draft Relentless Congestion Control March 2009
5. Usage Example: remote video upload
To help understand how Relentless TCP makes traffic management
easier, consider the following thought experiment: You are trying to
upload digital video from a remote site through a high delay
satellite link, shared by some small amount of other traffic. With
either Fair Queuing or Lower Effort service in the ground station it
can maintain separate queues for the TCP video upload and the smaller
flows. As long as the satellite link has fewer losses than one per
several RTTs, Relentless TCP can maintain a standing queue of video
data at the ground station. As long as the ground station has queued
video data, it can optimally allocate the link to the small flows
first (according to an explicit policy) and completely fill the rest
of the link with video. The small flows see minimal delays while the
link is 100% utilized.
To make this example more concrete, imagine that you are trying to
fill 10 Mb/s satellite link with a 1500 Byte MTU and a 600 ms RTT.
It would require about 500 packets to fill this path. With
Relentless TCP and a Baseline queue controller using a 1 second
sample interval, you would expect on average to see about one drop
every 2 sample intervals (roughly once per 3 RTTs, or about every
1500 packets). To mask the noise caused by the small flows, the
baseline controller could either have an extra long sample interval
or slightly higher setpoint. Failing this only compromises the
extent to which the system can compensate for the jitter introduced
by the small flows, but not its ability to fill most of the link
using a very straightforward configuration.
Contrast this to the situation where TCP is using standard AIMD
congestion control: the queue at the bottleneck needs to hold 500
packets, and TCP cwnd needs to sawtooth between 500 and 1000 packets.
There needs to be one loss every 1000 RTTs (twice 500 due to delayed
ACK), or about every 700,000 packets -- 500 times lower loss rate!
To do this optimally, the queue controller has to estimate when the
queue is holding slightly more than half of TCP's total window, so a
single lost packet will not cause idle time, due to TCP having too
small of window. If the queue controller gets it wrong, or if there
are too many losses (say due to too high of satellite BER), then a
single standard TCP flow can't fill the link. A delay sensing TCP
might do substantially better than Reno TCP, since it might be able
to sense the queuing delay at slightly above 500 packets and use it
to control TCP's window without causing losses or a large queue, but
it still requires a relatively low loss rate. (But note that a delay
sensing Relentless TCP probably could do even better).
Since Relentless TCP is implemented as an easily installed module and
some form of Fair Queuing or Less Effort service is supported by many
Mathis Expires September 5, 2009 [Page 8]
Internet-Draft Relentless Congestion Control March 2009
routers, this video upload example can be deployed today, on an as
needed basis. The next iteration of this note will document a
demonstration setup.
6. Security Considerations
The Relentless Congestion Control algorithm does not alter the basic
security properties of the protocols in which it might be
implemented, such as TCP and SCTP.
However, it does provide an opportunity for a greedy user to
preemptively use more than their TCP-friendly share of the network.
We argue in a separate document [unfriendly] that the TCP-friendly
capacity allocation paradigm needs to be replaced by one in which the
network is explicitly responsible for capacity allocation. This
paradigm shift would fully address this security concern.
There is some concern that publishing our implementation [DOWNLOAD]
will have undesirable consequences, however any competent kernel
programmer can reconstruct our results from this note.
7. References
[AFD] Pan, R., Breslau, L., Prabhakar, B., and S. Shenker,
"Approximate fairness through differential dropping",
SIGCOMM CCR 33(2), April 2003.
[DOWNLOAD]
Mathis, M., "Relentless Congestion Control",
http://staff.psc.edu/mathis/relentless/, March 2009.
[FQ] Demers, A., Keshav, S., and S. Shenker, "Analysis and
simulation of a fair queuing algorithm", SIGCOMM 89,
August 1989.
[MSMO97] Mathis, M., Sempke, J., Mahdavi, J., and T. Ott, "The
Macroscopic Behavior of the TCP Congestion Avoidance
Algorithm", SIGCOMM CCR 18(4), July 1997.
[RED] Floyd, S. and v. Jacobson, "Random Early Detection (RED)
gateways for Congestion Avoidance", IEEE/ACM Transactions
on Networking 1(4), August 1993.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996.
Mathis Expires September 5, 2009 [Page 9]
Internet-Draft Relentless Congestion Control March 2009
[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41,
RFC 2914, September 2000.
[RFC3662] Bless, R., Nichols, K., and K. Wehrle, "A Lower Effort
Per-Domain Behavior (PDB) for Differentiated Services",
RFC 3662, December 2003.
[Van88] Jacobson, V., "Congestion Avoidance and Control",
SIGCOMM 88, August 1988.
[unfriendly]
Mathis , M., "Rethinking TCP Friendly",
draft-mathis-iccrg-unfriendly-00.txt (work in progress),
March 2009.
Appendix A. Acknowledgments
This work is supported by a generous gift from Cisco System, Inc.
Fred Baker, Nandita Dukkipati and others at Cisco have provided
valuable input and inspiration on how to best frame the problem.
John Heffner contributed sanity to early version of my ideas.
Author's Address
Matthew Mathis
Pittsburgh Supercomputing Center
300 S. Craig St.
Pittsburgh, PA 15213
Email: mathis@psc.edu
Mathis Expires September 5, 2009 [Page 10]