Internet Congestion Control                                   M. Mathis
Research Group                          Pittsburgh Supercomputing Center
Internet-Draft                                             March 4, 2009
Intended status: Informational
Expires: September 5, 2009


                     Relentless Congestion Control
                  draft-mathis-iccrg-relentless-tcp-00

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 5, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   Relentless congestion control is a simple modification that can be
   applied to almost any AIMD style congestion control: instead of



Mathis                  Expires September 5, 2009               [Page 1]


Internet-Draft        Relentless Congestion Control           March 2009


   applying a multiplicative reduction to cwnd after a loss, cwnd is
   reduced by the number of lost segments.  It can be modeled as a
   strict implementation of van Jacobson's Packet Conservation
   Principle.  During recovery, new segments are injected into the
   network in exact accordance with the segments that are reported to
   have been delivered to the receiver by the returning ACKs.

   This algorithm offers a valuable new congestion control property: the
   TCP portion of the control loop has exactly unity gain, which should
   make it easier to implement simple controllers in network devices to
   accurately control queue sizes across a huge range of scales.

   Relentless Congestion Control conforms to neither the details nor the
   philosophy of current congestion control standards.  These standards
   are based on the idea that the Internet can attain sufficient
   fairness by having relatively simple network devices send uniform
   congestion signals to all flows, and mandating that all protocols
   have equivalent responses to these congestion signals.

   To function appropriately in a shared environment, Relentless
   Congestion Control requires that the network allocates capacity
   through some technique such as Fair Queuing, Approximate Fair
   Dropping, etc.  The salient features of these algorithms are that
   they segregate the traffic into distinct flows, and send different
   congestion signals to each flow.  This alternative congestion control
   paradigm is described in a separate document, also under
   consideration by the ICCRG.

   The goal of the document is to illustrate some new protocol features
   and properties might be possible if we relax the "TCP-friendly"
   mandate.  A secondary goal of Relentless TCP is to make a distinction
   between the bottlenecks that belong to protocol itself, vs standard
   congestion control and the "TCP-friendly" paradigm.


















Mathis                  Expires September 5, 2009               [Page 2]


Internet-Draft        Relentless Congestion Control           March 2009


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Implementation . . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Queue Controllers  . . . . . . . . . . . . . . . . . . . . . .  6
   4.  Properties . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Relentless is not at all AIMD-friendly . . . . . . . . . .  7
     4.2.  Performance Model  . . . . . . . . . . . . . . . . . . . .  7
     4.3.  Long Fat Networks (LFN)  . . . . . . . . . . . . . . . . .  7
   5.  Usage Example: remote video upload . . . . . . . . . . . . . .  8
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
   Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 10
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10





































Mathis                  Expires September 5, 2009               [Page 3]


Internet-Draft        Relentless Congestion Control           March 2009


1.  Introduction

   Relentless congestion control is a simple modification that can be
   applied to almost any AIMD style congestion control: instead of
   applying a multiplicative reduction to cwnd after a loss, cwnd is
   reduced by the number of lost segments.  The resulting cwnd is
   normally the same as the actual window of data that was successfully
   delivered during the lossy RTT.  If the application is keeping up
   with the network and other traffic does not change, this is the
   correct cwnd to exactly fill the network on subsequent RTTs.

   Relentless Congestion Control conforms to neither the details nor the
   philosophy of current congestion control standards [RFC2914].  These
   standards are based on the idea that the Internet can attain
   sufficient fairness by having relatively simple network devices send
   uniform congestion signals to all flows, and mandating that all
   protocols have equivalent responses to these congestion signals.

   To function appropriately in a shared environment, Relentless
   Congestion Control requires that the network allocates capacity
   through some technique such as Fair Queuing [FQ], Approximate Fair
   Dropping [AFD], etc.  The salient features of these algorithms are
   that they segregate the traffic into distinct flows, and send
   different congestion signals (segment losses, ECN marks or Queuing
   delay) to each flow such that the network, and not the end-system,
   controls capacity allocation.  This alternative congestion control
   paradigm is described in a separate document [unfriendly], also under
   consideration by the ICCRG.

   Relentless Congestion Control offers a valuable new property: the TCP
   portion of the control loop has exactly unity gain.  This property is
   ideal from the standpoint of the queue controllers embedded within
   network devices -- if some flow is using 1% too much bandwidth, then
   dropping 1% of its segments will bring it back to the proper data
   rate one RTT later.  Likewise if a flow is occupying 10 segments too
   much queue space, then dropping 10 segments will bring it back to the
   proper queue occupancy on the next RTT.  This unity gain property
   should make it easier to implement simple queue controllers in
   network devices that accurately control queues across a huge range of
   scales.

   Note the duality here, Relentless Congestion Control both requires
   that the network manages traffic while at the same time, it makes it
   easier to do so.

   We do not claim that Relentless Congestion Control is ideal, or
   without its own set of problems.  We are not even planning to
   standardize it at this time.  The goal of the document is to



Mathis                  Expires September 5, 2009               [Page 4]


Internet-Draft        Relentless Congestion Control           March 2009


   illustrate what new protocol features and properties might be
   possible if we relax the "TCP-friendly" mandate.

   In some data intensive communities, TCP has the reputation of being
   too slow to be useful.  A secondary goal of Relentless TCP is to make
   a distinction between the bottlenecks that belong to protocol itself,
   vs standard congestion control and the "TCP-friendly" paradigm.

   Please send commentary on this draft to the ICCRG mailing list:
   iccrg@cs.ucl.ac.uk.


2.  Implementation

   The simplest mental model of Relentless Congestion Control is a
   strict implementation of van Jacobson's Packet Conservation Principle
   [Van88].  During recovery, new segments are injected into the network
   in exact accordance with the segments that are reported to have been
   delivered to the receiver by the returning ACKs.  As long as the
   network does not lose all segments or ACKs, and the SACK scoreboard
   [RFC2018] chooses appropriate data to retransmit, this will clearly
   converge in one round trip time to exactly the window successfully
   delivered during the lossy round trip.  Since this quantity of data
   was successfully delivered during a lossy round trip it is both the
   largest justifiable estimate as well as the most accurate estimate of
   the maximum capacity of the network, and a good starting point for
   subsequent RTTs.  Algorithms that open the window, Slow-Start and
   Congestion Avoidance, are only permitted when the receiver is
   reporting contiguous data, which implies that there has subsequently
   been a full RTT without any additional losses.

   Our implementation is not quite as elegant, and consequently is not
   as accurate as the mental model.  It is based on an existing TCP
   implementation that supports plugable congestion control modules.
   Unfortunately, some of the cwnd adjustments are wired into the loss
   recovery code and can not be suppressed directly.  They can only be
   offset later in the plugable Relentless Congestion Control module.
   For details and source code, see the web site [DOWNLOAD].

   As a network safety and possible deployment strategy, our
   implementation requires that the traffic be marked with the "Lower
   Effort" (LE) DSCP [RFC3662].  In portions of the Internet where LE is
   supported, this permits Relentless TCP to co-exist with standard AIMD
   TCP using default Best Effort (BE) service, without significantly
   perturbing the latter.

   There may be concerns about the consequences of publishing our code.
   However, any competent kernel programmer can trivially reconstruct



Mathis                  Expires September 5, 2009               [Page 5]


Internet-Draft        Relentless Congestion Control           March 2009


   our work from this document.  The Relentless algorithm is
   substantially simpler than the carefully tweaked algorithms that it
   replaces.  The majority of the complexity of our implementation comes
   from preserving compatibility with other plugable congestion control
   algorithms.


3.  Queue Controllers

   There has been a lot research into how to optimally manage queues,
   starting with Sally Floyd's paper on Random Early Detection [RED].
   None of the proposed solutions is entirely satisfactory.  The
   difficulty is that to optimally control traffic from AIMD congestion
   control requires that the queue controller estimate some parameters
   of the traffic, such as effective RTT or cwnd.  These parameters are
   needed to estimate the drop rate required to attain a specific data
   rate.  Furthermore, in high speed devices the calculations have to
   run at line rate in silicon.  So far all known solutions are, at
   best, engineering compromises, and are suboptimal in environments
   that are significantly different than their design point.

   We claim that simplifying TCP's response to congestion will make it
   much easier to implement simple lightweight queue controllers that
   can accurately manage queue lengths across a huge range of network
   scales.

   For example we define a simple "Baseline Controller" to manage the
   length of a queue.  It would monitor the minimum queue length, and
   periodically randomly discard or ECN mark as many packets as the
   minimum queue length was above the set point during the previous
   interval.  This keeps bringing the minimum queue length down to the
   set point.  The periodic queue management interval can be any
   convenient interval somewhat longer than one RTT. (e.g. 1 to 10 times
   the RTT).  Although the accuracy of this controller improves when
   it's polling interval is better matched to the actual RTT, it clearly
   is not very sensitive to mismatch.  When the polling interval is 10
   times the RTT, its average error would only be 5 packets, as long as
   the flow was Relentless TCP.


4.  Properties

   Note that Relentless TCP has not been exhaustively tested yet.  This
   section reflects early testing and analysis of the algorithm.  Our
   results will improve.






Mathis                  Expires September 5, 2009               [Page 6]


Internet-Draft        Relentless Congestion Control           March 2009


4.1.  Relentless is not at all AIMD-friendly

   Not too surprisingly, Relentless TCP does not share the network very
   well with AIMD TCP if both see the same average loss rate because
   there is no mechanism to isolate the flows.

4.2.  Performance Model

   The traditional 1/sqrt(p) TCP performance model [MSMO97] will not
   apply.  A simple analysis suggests that in steady state the
   Relentless TCP data rate will be something like MSS/(3*RTT*p).  In an
   absolutely quiet network with drop tail queues you expect there to be
   one loss per three round trips: when the bottleneck queue is full and
   congestion avoidance increments the window, it will take one RTT
   before the sender observes the subsequent loss, one RTT to repair the
   loss, and one RTT of congestion avoidance to reach the next cwnd
   increment.  In real networks the losses are observed to be much more
   bursty: clusters of losses followed by multiple loss-less RTTs.

   Our implementation is further hampered by the need to offset cwnd
   adjustments that occur in the mainline TCP code as a side effect of
   processing partial ACKs and and other events.  We overcome these
   adjustments in the plugable congestion control module by setting
   ssthresh to the old cwnd minus losses, thus there is often an
   interval of slowstart following recovery.  We strive to understand
   and further eliminate these situations where the mainline code
   differs from the ideal packet conservation principle.

4.3.  Long Fat Networks (LFN)

   In our moderately large window tests (a Gigabit network with a 70 ms
   RTT) we were able to routinely attain better than 500 Mb/s even
   though the loss rate was in excess of 0.07%.  We know from the
   performance model that attaining this rate with standard AIMD
   congestion control requires a loss rate that is less than 0.00003%.
   Clearly Relentless TCP can tolerate many orders of magnitude higher
   loss rates than other TCPs.

   However the packet traces reveal several anomalies not directly
   related to Relentless Congestion Control, including spurious TCP
   retransmissions, intermittent massive packet reordering, and various
   unexpected limitations on data transmissions, when there seems to be
   plenty of available window.  This raises an interesting side issue:
   Relentless TCP may exercise the SACK and recovery code several order
   of magnitude harder than other congestion control algorithms.  We may
   find that we need to test this code much more thoroughly than it has
   been needed in the past.




Mathis                  Expires September 5, 2009               [Page 7]


Internet-Draft        Relentless Congestion Control           March 2009


5.  Usage Example: remote video upload

   To help understand how Relentless TCP makes traffic management
   easier, consider the following thought experiment: You are trying to
   upload digital video from a remote site through a high delay
   satellite link, shared by some small amount of other traffic.  With
   either Fair Queuing or Lower Effort service in the ground station it
   can maintain separate queues for the TCP video upload and the smaller
   flows.  As long as the satellite link has fewer losses than one per
   several RTTs, Relentless TCP can maintain a standing queue of video
   data at the ground station.  As long as the ground station has queued
   video data, it can optimally allocate the link to the small flows
   first (according to an explicit policy) and completely fill the rest
   of the link with video.  The small flows see minimal delays while the
   link is 100% utilized.

   To make this example more concrete, imagine that you are trying to
   fill 10 Mb/s satellite link with a 1500 Byte MTU and a 600 ms RTT.
   It would require about 500 packets to fill this path.  With
   Relentless TCP and a Baseline queue controller using a 1 second
   sample interval, you would expect on average to see about one drop
   every 2 sample intervals (roughly once per 3 RTTs, or about every
   1500 packets).  To mask the noise caused by the small flows, the
   baseline controller could either have an extra long sample interval
   or slightly higher setpoint.  Failing this only compromises the
   extent to which the system can compensate for the jitter introduced
   by the small flows, but not its ability to fill most of the link
   using a very straightforward configuration.

   Contrast this to the situation where TCP is using standard AIMD
   congestion control: the queue at the bottleneck needs to hold 500
   packets, and TCP cwnd needs to sawtooth between 500 and 1000 packets.
   There needs to be one loss every 1000 RTTs (twice 500 due to delayed
   ACK), or about every 700,000 packets -- 500 times lower loss rate!
   To do this optimally, the queue controller has to estimate when the
   queue is holding slightly more than half of TCP's total window, so a
   single lost packet will not cause idle time, due to TCP having too
   small of window.  If the queue controller gets it wrong, or if there
   are too many losses (say due to too high of satellite BER), then a
   single standard TCP flow can't fill the link.  A delay sensing TCP
   might do substantially better than Reno TCP, since it might be able
   to sense the queuing delay at slightly above 500 packets and use it
   to control TCP's window without causing losses or a large queue, but
   it still requires a relatively low loss rate.  (But note that a delay
   sensing Relentless TCP probably could do even better).

   Since Relentless TCP is implemented as an easily installed module and
   some form of Fair Queuing or Less Effort service is supported by many



Mathis                  Expires September 5, 2009               [Page 8]


Internet-Draft        Relentless Congestion Control           March 2009


   routers, this video upload example can be deployed today, on an as
   needed basis.  The next iteration of this note will document a
   demonstration setup.


6.  Security Considerations

   The Relentless Congestion Control algorithm does not alter the basic
   security properties of the protocols in which it might be
   implemented, such as TCP and SCTP.

   However, it does provide an opportunity for a greedy user to
   preemptively use more than their TCP-friendly share of the network.
   We argue in a separate document [unfriendly] that the TCP-friendly
   capacity allocation paradigm needs to be replaced by one in which the
   network is explicitly responsible for capacity allocation.  This
   paradigm shift would fully address this security concern.

   There is some concern that publishing our implementation [DOWNLOAD]
   will have undesirable consequences, however any competent kernel
   programmer can reconstruct our results from this note.


7.  References

   [AFD]      Pan, R., Breslau, L., Prabhakar, B., and S. Shenker,
              "Approximate fairness through differential dropping",
              SIGCOMM CCR 33(2), April 2003.

   [DOWNLOAD]
              Mathis, M., "Relentless Congestion Control",
               http://staff.psc.edu/mathis/relentless/, March 2009.

   [FQ]       Demers, A., Keshav, S., and S. Shenker, "Analysis and
              simulation of a fair queuing algorithm", SIGCOMM 89,
              August 1989.

   [MSMO97]   Mathis, M., Sempke, J., Mahdavi, J., and T. Ott, "The
              Macroscopic Behavior of the TCP Congestion Avoidance
              Algorithm", SIGCOMM CCR 18(4), July 1997.

   [RED]      Floyd, S. and v. Jacobson, "Random Early Detection (RED)
              gateways for Congestion Avoidance", IEEE/ACM Transactions
              on Networking 1(4), August 1993.

   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
              Selective Acknowledgment Options", RFC 2018, October 1996.




Mathis                  Expires September 5, 2009               [Page 9]


Internet-Draft        Relentless Congestion Control           March 2009


   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
              RFC 2914, September 2000.

   [RFC3662]  Bless, R., Nichols, K., and K. Wehrle, "A Lower Effort
              Per-Domain Behavior (PDB) for Differentiated Services",
              RFC 3662, December 2003.

   [Van88]    Jacobson, V., "Congestion Avoidance and Control",
              SIGCOMM 88, August 1988.

   [unfriendly]
              Mathis , M., "Rethinking TCP Friendly",
              draft-mathis-iccrg-unfriendly-00.txt (work in progress),
              March 2009.


Appendix A.  Acknowledgments

   This work is supported by a generous gift from Cisco System, Inc.
   Fred Baker, Nandita Dukkipati and others at Cisco have provided
   valuable input and inspiration on how to best frame the problem.
   John Heffner contributed sanity to early version of my ideas.


Author's Address

   Matthew Mathis
   Pittsburgh Supercomputing Center
   300 S. Craig St.
   Pittsburgh, PA  15213

   Email: mathis@psc.edu



















Mathis                  Expires September 5, 2009              [Page 10]