Internet Engineering Task Force                                 M. Welzl
Internet-Draft                                   University of Innsbruck
Intended status: Informational                             March 3, 2009
Expires: September 4, 2009


         A Survey of Lower-than-Best Effort Transport Protocols
                    draft-welzl-ledbat-survey-00.txt

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 4, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document provides a survey of transport protocols which are
   designed to have a smaller bandwidth and/or delay impact on standard
   TCP than standard TCP itself when they share a bottleneck with it.



Welzl                   Expires September 4, 2009               [Page 1]


Internet-Draft            LBE Transport Survey                March 2009


   Such protocols could be used for low-priority "background" traffic,
   as they provide what is sometimes called a "less than" (or "lower
   than") best effort service.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Delay-based transport protocols  . . . . . . . . . . . . . . .  3
   3.  Non-delay-based transport protocols  . . . . . . . . . . . . .  6
   4.  Application layer approaches . . . . . . . . . . . . . . . . .  6
   5.  Orthogonal work  . . . . . . . . . . . . . . . . . . . . . . .  7
   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  8
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  8
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . .  8
   9.  Informative References . . . . . . . . . . . . . . . . . . . .  8
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10


































Welzl                   Expires September 4, 2009               [Page 2]


Internet-Draft            LBE Transport Survey                March 2009


1.  Introduction

   As a starting point for the work in the LEDBAT group, this document
   presents a brief survey of efforts to attain a Less than Best Effort
   (LBE) service without help from routers.  We loosely define a LBE
   service as a service which has smaller bandwidth and/or delay impact
   on standard TCP than standard TCP itself when sharing a bottleneck
   with it.  We refer to systems that provide this service as Less than
   Best Effort (LBE) systems.  Generally, LBE behavior can be achieved
   by reacting to queue growth earlier than standard TCP would, or by
   changing the congestion avoidance behavior of TCP without utilizing
   any additional implicit feedback.  Some mechanisms achieve a LBE
   behavior at the application layer, e.g. by changing the receiver
   window of standard TCP, and there is also a substantial amount of
   work that is related to the LBE concept but not presenting a solution
   that can be installed in end hosts or expected to work over the
   Internet.  According to this classification, solutions have been
   categorized as delay-based transport protocols, non-delay-based
   transport protocols, application layer approaches and orthogonal work
   in this document.

   The author wishes to emphasize that, in its present form, this
   document is only a starting point and not based on a thorough
   literature study.  Many relevant references will be missing, and an
   apology goes to all authors of related work that has not been
   mentioned here.


2.  Delay-based transport protocols

   It is wrong to generally equate "little impact on standard TCP" with
   "small sending rate".  Unless the sender's maximum window is limited
   for some reason, and in the absence of ECN support, standard TCP will
   normally increase its rate until a queue overflows, causing one or
   more packets to be dropped and the rate to be reduced.  A protocol
   which stops increasing the rate before this event happens can, in
   principle, achieve a better performance than standard TCP.  In the
   absence of any other traffic, this is even true for TCP itself when
   its maximum send window is limited to the bandwidth*round-trip time
   (RTT) product.

   TCP Vegas [Bra+94] is one of the first protocols that was known to
   have a smaller sending rate than standard TCP when both protocols
   share a bottleneck [Kur+00] -- yet it was designed to achieve more,
   not less throughput than standard TCP.  Indeed, when it is the only
   protocol on the bottleneck, the throughput of TCP Vegas is greater
   than the throughput of standard TCP.  Depending on the bottleneck
   queue length, TCP Vegas itself can be starved by standard TCP flows.



Welzl                   Expires September 4, 2009               [Page 3]


Internet-Draft            LBE Transport Survey                March 2009


   This can be remedied to some degree by the RED Active Queue
   Management mechanism [RFC2309].

   The congestion avoidance behavior is the protocol's most important
   feature in terms of historical relevance as well as relevance in the
   context of this document (it has been shown that other elements of
   the protocol can sometimes play a greater role for its overall
   behavior [Hen+00]).  In Congestion Avoidance, once per RTT, TCP Vegas
   calculates the expected throughput as WindowSize / BaseRTT, where
   WindowSize is the current congestion window and BaseRTT is the
   minimum of all measured RTTs.  The expected throughput is then
   compared with the actual (measured) throughput.  If the actual
   throughput is smaller than the expected throughput minus a threshold,
   this is taken as a sign that the network is underutilized, causing
   the protocol to linearly increase its rate.  If the actual throughput
   is greater than the expected throughput plus a threshold, this is
   taken as a sign of congestion, causing the protocol to linearly
   decrease its rate.

   TCP Vegas has been analyzed extensively.  One of the most prominent
   properties of TCP Vegas is its fairness between multiple flows of the
   same kind, which does not penalize flows with large propagation
   delays in the same way as standard TCP.  While it was not the first
   protocol that uses delay as a congestion indication, its predecessors
   (which can be found in [Bra+94]) are not discussed here because of
   the historical "landmark" role that TCP Vegas has taken in the
   literature.

   Transport protocols which were designed to be non-intrusive include
   TCP-LP [Kuz+06], TCP Nice [Ven+02] and 4CP [Liu+07].  Using a simple
   analytical model, the authors of [Kuz+06] illustrate the feasibility
   of this endeavor by showing that, due to the non-linear relationship
   between throughput and RTT, it is possible to remain transparent to
   standard TCP even when the flows under consideration have a larger
   RTT than standard TCP flows.

   TCP Nice [Ven+02] follows the same basic approach as TCP Vegas but
   improves upon it in some aspects.  Because of its moderate linear-
   decrease congestion response, TCP Vegas can affect standard TCP
   despite its ability to detect congestion early.  TCP Nice removes
   this issue by halving the congestion window (at most once per RTT,
   like standard TCP) instead of linearly reducing it.  To avoid being
   too conservative, this is only done if a fixed predefined fraction of
   delay-based incipient congestion signals appears within one RTT.
   Otherwise, TCP Nice falls back to the congestion avoidance rules of
   TCP Vegas if no packet was lost or standard TCP if a packet was lost.
   One more feature of TCP Nice is its ability to support a congestion
   window of less than one packet, by clocking out single packets over



Welzl                   Expires September 4, 2009               [Page 4]


Internet-Draft            LBE Transport Survey                March 2009


   more than one RTT.  With ns-2 simulations and real-life experiments
   using a Linux implementation, the authors of [Ven+02] show that TCP
   Nice achieves its goal of efficiently utilizing spare capacity while
   being non-intrusive to standard TCP.

   Other than TCP Vegas and TCP Nice, TCP-LP uses only the one-way delay
   (OWD) instead of the RTT as an indicator of incipient congestion.
   This is done to avoid reacting to delay fluctuations that are caused
   by reverse cross-traffic.  Using the TCP Timestamps option [RFC1323],
   the OWD is determined as the difference between the receiver's
   Timestamp value in the ACK and the original Timestamp value that the
   receiver copied into the ACK.  While the result of this subtraction
   can only precisely represent the OWD if clocks are synchronized, its
   absolute value is of no concern to TCP-LP and hence clock
   synchronization is unnecessary.  Using a constant smoothing
   parameter, TCP-LP calculates an Exponentially Weighted Moving Average
   (EWMA) of the measured OWD and checks whether the result exceeds a
   threshold within the range of the minimum and maximum OWD that was
   seen during the connections's lifetime; if it does, this condition is
   interpreted as an "early congestion indication".  The minimum and
   maximum OWD values are initialized during the slow-start phase.

   Regarding its reaction to an early congestion indication, TCP-LP
   tries to strike a middle ground between the overly conservative
   choice of immediately setting the congestion window to one packet and
   the presumably too aggressive choice of halving the congestion window
   like standard TCP.  It does so by halving the window at first in
   response to an early congestion indication, then initializing an
   "interference time-out timer", and maintaining the window size until
   this timer fires.  If another early congestion indication appeared
   during this "interference phase", the window is then set to 1;
   otherwise, the window is maintained and TCP-LP continues to increase
   it the standard Additive-Increase fashion.  This method ensures that
   it takes at least two RTTs for a TCP-LP flow to decrease its window
   to 1, and, like standard TCP, TCP-LP reacts to congestion at most
   once per RTT.

   With ns-2 simulations and real-life experiments using a Linux
   implementation, the authors of [Kuz+06] show that TCP-LP is largely
   non-intrusive to TCP traffic while at the same time enabling it to
   utilize a large portion of the excess network bandwidth, which is
   fairly shared among competing TCP-LP flows.  They also show that
   using their protocol for bulk data transfers greatly reduces file
   transfer times of competing best-effort web traffic.







Welzl                   Expires September 4, 2009               [Page 5]


Internet-Draft            LBE Transport Survey                March 2009


3.  Non-delay-based transport protocols

   4CP [Liu+07], which stands for "Competitive and Considerate
   Congestion Control", is a protocol which provides a LBE service by
   changing the window control rules of standard TCP.  A "virtual
   window" is maintained, which, during a so-called "bad congestion
   phase" is reduced to less than a predefined minimum value of the
   actual congestion window.  The congestion window is only increased
   again once the virtual window exceeds this minimum, and in this way
   the virtual window controls the duration during which the sender
   transmits with a fixed minimum rate.  The 4CP congestion avoidance
   algorithm allows for setting a target average window and avoids
   starvation of "background" flows while bounding the impact on
   "foreground" flows.  Its performance was evaluated in ns-2
   simulations and in real-life experiments with a kernel-level
   implementation in Microsoft Windows Vista.

   Some work was done on applying weights to congestion control
   mechanisms, allowing a flow to be as aggressive as a number of
   parallel TCP flows at the same time.  This is usually motivated by
   the fact that users may want to assign different priorities to
   different flows.  The first, and best known, such protocol is MulTCP
   [Cro+98], which emulates N TCPs in a rather simple fashion.  An
   improved version of MulTCP is presented in [Hac+04], and there is
   also a variant where only one feedback loop is applied to control a
   larger traffic aggregate by the name of Probe-Aided (PA-)MulTCP
   [Kuo+08].  Another protocol, CP [Ott+04], applies the same concept to
   the TFRC protocol [RFC5348] in order to provide such fairness
   differentiation for multimedia flows.

   The general assumption underlying all of the above work is that these
   protocols are "N-TCP-friendly", i.e. they are as TCP-friendly as N
   TCPs, where N is a positive (and possibly natural) number which is
   greater than or equal to 1.  The MulTFRC [Dam+09] protocol, another
   extension of TFRC for multiple flows, is however able to support
   values between 0 and 1, making it applicable as a mechanism for a LBE
   service.  Since it does not react to delay like the mechanisms above
   but adjusts its rate like TFRC, it can probably be expected to be
   more aggressive than mechanisms such as TCP Nice or TCP-LP.  This
   also means that MulTFRC is less likely to be prone to starvation, as
   its aggression is tunable at a fine granularity even when N is
   between 0 and 1.


4.  Application layer approaches

   The mechanism described in [Spr+00] controls the bandwidth by letting
   the receiver intelligently manipulate the receiver window of standard



Welzl                   Expires September 4, 2009               [Page 6]


Internet-Draft            LBE Transport Survey                March 2009


   TCP.  This is done because the authors assume a client-server setting
   where the receiver's access link is typically the bottleneck.  The
   scheme incorporates a delay-based calculation of the expected queue
   length at the bottleneck, which is quite similar to the calculation
   in the above delay based protocols, e.g.  TCP Vegas.  Using a Linux
   implementation, where TCP flows are classified according to their
   application's needs, it is shown that a significant improvement in
   packet latency can be attained over an unmodified system while
   maintaining good link utilization.

   Receiver window tuning is also done in [Key+04], where choosing the
   right value for the window is phrased as an optimization problem.  On
   this basis, two algorithms are presented, binary search -- which is
   faster than the other one at achieving a good operation point but
   fluctuates -- and stochastic optimization, which does not fluctuate
   but converges slower than binary search.  These algorithms merely use
   the previous receiver window and the amount of data received during
   the previous control interval as input.  According to [Key+04], the
   encouraging simulation results suggest that such an application level
   mechanism can work almost as well as a transport layer scheme like
   TCP-LP.

   TODO: mention other rwnd tuning and different application layer work,
   e.g. from related work sections of [Egg+05] and [Kok+04] and intro of
   [Key+04].


5.  Orthogonal work

   Various suggestions have been published for realizing a LBE service
   by influencing the way packets are treated in routers.  One example
   is the Persistent Class Based Queuing (P-CBQ) scheme presented in
   [Car+01], which is a variant of Class Based Queuing (CBQ) with per-
   flow accounting.  RFC 3662 [RFC3662] defines a DiffServ per-domain
   behavior called "Lower Effort".

   Harp [Kok+04] realizes a LBE service by dissipating background
   traffic to less-utilized paths of the network.  This is achieved
   without changing routers by using edge nodes as relays.  According to
   the authors, these edge nodes should be gateways of organizations in
   order to align their scheme with usage incentives, but the technical
   solution would also work if Harp was only deployed in end hosts.  It
   detects impending congestion by looking at delay similar to TCP Nice
   [Ven+02] and manages to improve utilization and fairness over pure
   single-path solutions.

   An entirely different approach is taken in [Egg+05]: here, the
   priority of a flow is reduced via a generic idletime scheduling



Welzl                   Expires September 4, 2009               [Page 7]


Internet-Draft            LBE Transport Survey                March 2009


   strategy in a host's operating system.  While results presented in
   this paper show that the new scheduler can effectively shield regular
   tasks from low-priority ones (e.g.  TCP from greedy UDP) with only a
   minor performance impact, it is an underlying assumption that all
   involved end hosts would use the idletime scheduler.  In other words,
   it is not the focus of this work to protect a standard TCP flow which
   originates from any host where the presented scheduling scheme may
   not be implemented.

   TODO: studies dealing with the precision of congestion prediction in
   end hosts (i.e. using delay to determine the onset of congestion) may
   be relevant in this document, and could be discussed here, e.g.
   [Bha+07] and the references therein.


6.  Acknowledgements

   The author would like to thank Dragana Damjanovic for reference
   pointers.  Surely lots of other folks will help in one way or another
   later and I'll thank them all here.


7.  IANA Considerations

   This memo includes no request to IANA.


8.  Security Considerations

   This document introduces no new security considerations.


9.  Informative References

   [Bha+07]   Bhandarkar, S., Reddy, A., Zhang, Y., and D. Loguinov,
              "Emulating AQM from end hosts", Proceedings of ACM
              SIGCOMM 2007, 2007.

   [Bra+94]   Brakmo, L., O'Malley, S., and L. Peterson, "TCP Vegas: New
              techniques for congestion detection and avoidance",
              Proceedings of SIGCOMM '94, pages 24-35, August 1994.

   [Car+01]   Carlberg, K., Gevros, P., and J. Crowcroft, "Lower than
              best effort: a design and implementation", Workshop on
              Data communication in Latin America and the
              Caribbean 2007, San Jose, Costa Rica, Pages: 244 - 265,
              2001.




Welzl                   Expires September 4, 2009               [Page 8]


Internet-Draft            LBE Transport Survey                March 2009


   [Cro+98]   Crowcroft, J. and P. Oechslin, "Differentiated end-to-end
              Internet services using a weighted proportional fair
              sharing TCP", ACM SIGCOMM Computer Communication
              Review vol. 28, no. 3 (July 1998), pp. 53-69, 1998.

   [Dam+09]   Damjanovic, D. and M. Welzl, "MulTFRC: Providing Weighted
              Fairness for Multimedia Applications (and others too!)",
              Work in progress ..., 2009.

   [Egg+05]   Eggert, L. and J. Touch, "A Lower Effort Per-Domain
              Behavior (PDB) for Differentiated Services", Proceedings
              of 20th ACM Symposium on Operating Systems Principles SOSP
              2005, Brighton, United Kingdom, pp. 249/262, October 2005.

   [Hac+04]   Hacker, T., Noble, B., and B. Athey, "Improving Throughput
              and Maintaining Fairness using Parallel TCP", Proceedings
              of Infocom 2004, March 2004.

   [Hen+00]   Hengartner, U., Bolliger, J., and T. Gross, "TCP Vegas
              revisited", Proceedings of Infocom 2000, March 2000.

   [Key+04]   Key, P., MassouliA(C), L., and B. Wang, "Emulating Low-
              Priority Transport at the Application Layer: a Background
              Transfer Service", Proceedings of ACM SIGMETRICS 2004,
              January 2004.

   [Kok+04]   Kokku, R., Bohra, A., Ganguly, S., and A. Venkataramani,
              "A Multipath Background Network Architecture", Proceedings
              of Infocom 2007, May 2007.

   [Kuo+08]   Kuo, F. and X. Fu, "Probe-Aided MulTCP: an aggregate
              congestion control mechanism", ACM SIGCOMM Computer
              Communication Review vol. 38, no. 1 (January 2008), pp.
              17-28, 2008.

   [Kur+00]   Kurata, K., Hasegawa, G., and M. Murata, "Fairness
              Comparisons Between TCP Reno and TCP Vegas for Future
              Deployment of TCP Vegas", Proceedings of INET 2000,
              July 2000.

   [Kuz+06]   Kuzmanovic, A. and E. Knightly, "TCP-LP: low-priority
              service via end-point congestion control", IEEE/ACM
              Transactions on Networking (ToN)  Volume 14, Issue 4, pp.
              739-752., August 2006,
              <http://www.ece.rice.edu/networks/TCP-LP/>.

   [Liu+07]   Liu, S., Vojnovic, M., and D. Gunawardena, "Competitive
              and Considerate Congestion Control for Bulk Data



Welzl                   Expires September 4, 2009               [Page 9]


Internet-Draft            LBE Transport Survey                March 2009


              Transfers", Proceedings of IWQoS 2007, June 2007.

   [Ott+04]   Ott, D., Sparks, T., and K. Mayer-Patel, "Aggregate
              congestion control for distributed multimedia
              applications", Proceedings of Infocom 2004, March 2004.

   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
              for High Performance", RFC 1323, May 1992.

   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
              S., Wroclawski, J., and L. Zhang, "Recommendations on
              Queue Management and Congestion Avoidance in the
              Internet", RFC 2309, April 1998.

   [RFC3662]  Bless, R., Nichols, K., and K. Wehrle, "A Lower Effort
              Per-Domain Behavior (PDB) for Differentiated Services",
              RFC 3662, December 2003.

   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
              Friendly Rate Control (TFRC): Protocol Specification",
              RFC 5348, September 2008.

   [Spr+00]   Spring, N., Chesire, M., Berryman, M., Sahasranaman, V.,
              Anderson, T., and B. Bershad, "Receiver based management
              of low bandwidth access links", Proceedings of
              Infocom 2000, pp. 245-254, vol.1, 2000.

   [Ven+02]   Venkataramani, A., Kokku, R., and M. Dahlin, "TCP Nice: a
              mechanism for background transfers", Proceedings of
              OSDI '02, 2002.


Author's Address

   Michael Welzl
   University of Innsbruck
   Technikerstr. 21 A
   Innsbruck,   6020
   Austria

   Phone: +43 512 507 6110
   Email: michael.welzl@uibk.ac.at







Welzl                   Expires September 4, 2009              [Page 10]