Internet Engineering Task Force                             Josh Blanton
INTERNET DRAFT                                           Ohio University
draft-allman-rto-backoff-04.txt                            Ethan Blanton
Expires: June 2007                                     Purdue University
                                                             Mark Allman
                                                               ICIR/ICSI
                                                           December 2006


   Using Spurious Retransmissions to Adapt the Retransmission Timeout
                    draft-allman-rto-backoff-04.txt

Status of this Memo

    By submitting this Internet-Draft, each author represents that any
    applicable patent or other IPR claims of which he or she is aware
    have been or will be disclosed, and any of which he or she becomes
    aware will be disclosed, in accordance with Section 6 of BCP 79.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt.

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.

Copyright Notice

    Copyright (C) The Internet Society (2006).

Abstract

    This document describes a method for using spurious retransmission
    timeouts as the trigger for slightly changing the way TCP's
    retransmission timeout is computed in an effort to avoid subsequent
    unnecessary retransmissions.

Terminology

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
    "OPTIONAL" in this document are to be interpreted as described
    in [RFC2119].

    The reader is expected to be familiar with the algorithm and
    terminology from [RFC2988].

Expires: June 2007                                              [Page 1]


draft-allman-rto-backoff-04.txt                            December 2006


1.  Introduction

    Various studies have shown that the retransmission timeout (RTO)
    estimator in [RFC2988] can trigger spurious retransmissions.  [AP99]
    shows that such unnecessary retransmissions are generally fairly
    rare.  However, [LK00] shows that in some networks (e.g., wireless
    networks) spurious retransmissions are more problematic due to
    occasional delay spikes that are not well predicted by TCP's RTO
    estimator.  In this document we outline one possible approach to
    mitigate the impact of pre-mature RTO firings by altering the RTO
    estimator specified in [RFC2988].

    Several methods for detecting spurious timeouts have been developed
    [RFC3522,RFC3708,RFC4138].  Additionally, [RFC4015] outlines one
    possible response to detecting spurious timeouts.  This document
    outlines an alternative to [RFC4015].  In general terms, [RFC4015]
    specifies two actions upon the detection of an unnecessary RTO-based
    retransmission.  First, the sending rate prior to the spurious
    retransmission is restored.  Furthermore, the RTO is adapted by
    re-initializing the RTO estimator with the long round-trip time
    (RTT) measurement that caused the spurious RTO.  The approach given
    in [RFC4015] is reasonable if the underlying cause of the problem is
    a shift in the path RTT.  For instance, if the route a TCP
    connection is traversing changes and the new path's RTT is
    significantly longer than the previous path's RTT then simply
    re-initializing the RTO is a reasonable action.

    As specified in the next section this document takes a slightly
    different approach than [RFC4015].  Generally, this document uses
    the failure of the RTO to wait long enough before triggering a
    retransmit as an indication that the RTO estimator itself is not
    properly capturing the variance present in the RTTs experienced by
    the TCP connection.  Therefore, this document calls for an increase
    in the contribution of the variance component in the RTO estimator
    upon the detection of retransmission timeouts in an effort to cope.
    This change represents a preference to try to avoid future spurious
    timeouts rather than simply reacting to each spurious
    retransmission.

    We note that TCP implementations using the RTTM mechanism [RFC1323]
    to assess the RTT multiple times per RTT with the standard
    exponentially-weighted moving average (EWMA) gains from [RFC2988]
    retain less RTT history than when taking one RTT measurement per RTT.
    [AP99] shows that "fast" EWMAs yield more spurious retransmissions
    than when using the standard gains with one RTT sample per RTT.
    Therefore, an orthogonal change to TCP implementations that use RTTM
    that may prevent spurious RTOs is to set the EWMA gains based on the
    number of RTT samples taken per RTT such that the amount of history
    kept, in terms of time, is the same regardless of the number RTT
    samples taken [Flo98,LS00].

2.  Parameter Changes


Expires: June 2007                                              [Page 2]


draft-allman-rto-backoff-04.txt                            December 2006

    As the basis for the changes proposed below, a TCP MUST support an
    IETF-specified spurious timeout detection method.  Currently,
    [RFC3522], [RFC3708] and [RFC4138] are such detection methods.  We
    note that the research literature includes alternate methods for
    detecting spurious retransmissions, e.g., the "retransmit bit"
    [LK00], but these schemes MUST NOT be used as part of the changes
    specified in this document until such time that the IETF approves a
    specification of these schemes.

    We also note that [RFC2988] explicitly allows for an RTO estimator
    that is more conservative than that given in [RFC2988] (which this
    document specifies).

    Also we note that, given that the TCP is savvy enough to untangle
    needed and uneeded retransmission timeouts, the TCP does not need to
    use Karn's algorithm [KP87,RFC2988] and can accurately determine the
    RTT that causes spurious retransmissions.

    This document specifies that a TCP MAY change the RTO estimator
    given in [RFC2988] upon detection of a spurious timeout, as follows.

    The general idea behind the mechanism is to increase "K", the
    multiplier applied to RTTVAR in the RTO calculation given in step
    (2.3) of [RFC2988] to allow for additional variance in the path's
    RTT.  The specific mechanism for TCPs using this change is:

    (A) Upon the first expiration of the retransmission timer for a
        given sequence number, the values of SRTT and RTTVAR MUST be
        saved as SRTT_prev and RTTVAR_prev, respectively.

    (B) Upon detecting that a previous RTO-based retransmission was
        spurious, a TCP MUST calculate a K' using the RTT sample
        R', which is the time between when the original transmission of
        the given segment was sent and when the that original
        transmission is acknowledged, as follows:

          K' = ceil ((R' - SRTT_prev) / RTTVAR_prev)               (1)

        K' then becomes the multiplier that would have prevented the
        unneeded RTO-based retransmit.

        In the event that RTTVAR is zero, K' MUST remain at its previous
        value (or be set to 4, in the event that K' had not been
        previously calculated).

        The value of K' MUST NOT be reduced for the remainder of the
        connection (as discussed in more detail below).

    (C) The values of SRTT and RTTVAR in use when the spurious
        retransmit occured MUST replace the current values:

          SRTT = SRTT_prev                                         (2)
          RTTVAR = RTTVAR_prev                                     (3)


Expires: June 2007                                              [Page 3]


draft-allman-rto-backoff-04.txt                            December 2006

    (D) The R' RTT sample MUST be used to adjust SRTT and RTTVAR and
        therefore the RTO, per [RFC2988].

    The actual K that is used in the RTO calculation is determined by
    the size of the congestion window.  When a TCP has only a small
    number of outstanding segments, advanced loss recovery that relies
    on the receipt of three duplicate acknowledgments as a recovery
    trigger is not as effective as when the congestion window is larger.
    Therefore, TCP relies more heavily on the RTO in this regime.
    Furthermore, the impact caused by spurious timeouts in this
    situation---in terms of congestion window reduction and resource
    wastage by go-back-N transmission---is small.  Hence, when the
    congestion window is less than or equal to 4*SMSS bytes then the
    standard K of 4 SHOULD be used when calculating the RTO via step
    (2.3) from [RFC2988].  Once the congestion window size grows beyond
    4*SMSS bytes, the value of K' SHOULD be used in the calculation of
    the RTO.

    This specification explicitly offers no way to reduce K' after it
    has been inflated.  K' is never reduced because the presence of
    spurious timeouts which inflated K' indicates that the standard
    estimator is inadequate for accurately estimating the variance of
    the RTT across the network path and therefore reducing K' would
    increase the chances of further spurious retransmissions.

    Finally, we note that bounding K' is not advisable.  Say K' would be
    set to 20 via equation (1).  If K' were, instead, bound to 10 then
    legitimate RTOs would be forced to wait longer without offering
    solid protection against delay spikes (given that delay spikes that
    a K' of 10 will not handle have been observed).

3.  Advantages

    The advantage of tuning the RTO calculation to be more conservative
    after detecting spurious RTO-based retransmissions is in preventing
    further spurious RTOs.  In addition, spurious RTOs can cause
    go-back-N behavior [LK00] which can also be avoided by adapting the
    RTO to be more conservative.

4.  Disadvantages

    The disadvantage of tuning the RTO calculation to be more
    conservative is that legitimate RTO firings takes longer and could
    hurt performance.  However, an important note is that the RTO should
    not be TCP's primary loss recovery strategy.  [RFC3782] and
    [RFC3517] provide methods for TCP to effectively repair multiple
    lost segments from a single window of data without falling back to
    using the RTO.  Further, research shows that these changes are
    widely implemented [MAF05].  Therefore, making TCP's RTO calculation
    more conservative should not hinder performance under normal
    circumstance.  Put differently, when using advanced loss recovery
    techniques the firing of the RTO should be an indication that the
    congestion situation in the network is fairly bad.  In this case, it
    may well be that making the RTO estimator more conservative is the

Expires: June 2007                                              [Page 4]


draft-allman-rto-backoff-04.txt                            December 2006

    right general approach.

    The common exception to the above argument is when the congestion
    window is small, such that these advanced loss recovery algorithms
    do not work effectively.  The mechanism in this document explicitly
    takes this case into account by not using the more conservative RTO
    estimate when the congestion window is small.

5.  Summary

    This document specifies a small change that makes the RTO
    calculation given in [RFC2988] more conservative upon the detection
    of spurious RTO-based retransmissions.  The root cause of spurious
    retransmits is an inaccurate assessment of the network conditions
    (in this case, of the RTT).  Therefore, we tackle this by making the
    RTO calculation take into account RTT variance to a larger degree.
    While this does lengthen the time required for legitimate
    retransmissions to fire, the RTO should not be TCP's primary means
    for retransmitting data and therefore this lengthened interval
    should only minimally impact overall performance and should only
    come into play when conditions along the network path have
    deteriorated significantly.  Finally, we note that this document
    makes the estimator given in [RFC2988] strictly more conservative
    and is therefore allowed via [RFC2988].

6.  Security Considerations

    This document calls for a simple parameter tweak and does not change
    the security considerations given in [RFC2988].

7.  IANA Considerations

    None.

Acknowledgments

    This document has benefited from discussions with Ted Faber, Aaron
    Falk, Joseph Ishac, Janardhan Iyengar, Sally Floyd, Vern Paxson and
    Joe Touch.

Normative References

    [RFC2119] S. Bradner.  Key words for use in RFCs to Indicate
        Requirement Levels, March 1997.  BCP 14, RFC 2119.

    [RFC2988] V. Paxson, M. Allman.  Computing TCP's Retransmission
        Timer, November 2000.  RFC 2988.

    [RFC3522] R. Ludwig, M. Meyer.  The Eifel Detection Algorithm for
        TCP, April 2003.  RFC 3522.

    [RFC3708] E. Blanton, M. Allman.  Using TCP Duplicate Selective
        Acknowledgement (DSACKs) and Stream Control Transmission
        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)

Expires: June 2007                                              [Page 5]


draft-allman-rto-backoff-04.txt                            December 2006

        to Detect Spurious Retransmissions, February 2004.  RFC 3708.

    [RFC4138] P. Sarolahti, M. Kojo.  Forward RTO-Recovery (F-RTO): An
        Algorithm for Detecting Spurious Retransmission Timeouts with
        TCP and the Stream Control Transmission Protocol (SCTP), August
        2005.  RFC 4138.

Informative References

    [AP99] Mark Allman, Vern Paxson. On Estimating End-to-End Network
        Path Properties. ACM SIGCOMM, September 1999.

    [Flo98] Sally Floyd.  Comments on RFC1323.bis, TCP-LW mailing list,
        May 1998.

    [KP87] Phil Karn, Craig Partridge.  Improving Round-Trip Time
        Estimates in Reliable Transport Protocols.  ACM SIGCOMM, August
        1997.

    [LK00] R. Ludwig, R. H. Katz.  The Eifel Algorithm: Making TCP
        Robust Against Spurious Retransmissions.  ACM Computer
        Communication Review, 30(1), January 2000.

    [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM
        Computer Communication Review, Vol. 30, No. 3, July 2000.

    [MAF05] A. Medina, M. Allman, S. Floyd.  Measuring the Evolution of
        Transport Protocols in the Internet. ACM Computer Communication
        Review, 35(2), April 2005.

    [RFC3517] E. Blanton, M. Allman, K. Fall, L. Wang.  A Conservative
        Selective Acknowledgment (SACK)-based Loss Recovery Algorithm
        for TCP, April 2003.  RFC 3517.

    [RFC3782] S. Floyd, T. Henderson, A. Gurtov.  The NewReno
        Modification to TCP's Fast Recovery Algorithm, April 2004.  RFC
        3782.

    [RFC4015] R. Ludwig, A. Gurtov.  The Eifel Response Algorithm for
        TCP, February 2005.  RFC 4015.

Author's Addresses

    Josh Blanton
    Ohio University Internetworking Research Group
    301 Stocker Center
    Athens, OH  45701
    Email: jblanton@cs.ohiou.edu
    URL: http://irg.cs.ohiou.edu/~jblanton/

    Ethan Blanton
    Purdue University Computer Sciences
    250 North University Street
    West Lafayette, IN  47907

Expires: June 2007                                              [Page 6]


draft-allman-rto-backoff-04.txt                            December 2006

    Email: eblanton@cs.purdue.edu
    URL: http://www.cs.purdue.edu/homes/eblanton/

    Mark Allman
    ICSI Center for Internet Research
    1947 Center Street, Suite 600
    Berkeley, CA 94704-1198
    Phone: (440) 235-1792
    Email: mallman@icir.org
    URL: http://www.icir.org/mallman/

Intellectual Property Statement

    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.

    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use
    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.

    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
    this standard.  Please address the information to the IETF at
    ietf-ipr@ietf.org.

Disclaimer of Validity

    This document and the information contained herein are provided on
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
    INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

    Copyright (C) The Internet Society (2006).  This document is subject
    to the rights, licenses and restrictions contained in BCP 78, and
    except as set forth therein, the authors retain all their rights.

Acknowledgment

    Funding for the RFC Editor function is currently provided by the

Expires: June 2007                                              [Page 7]


draft-allman-rto-backoff-04.txt                            December 2006

    Internet Society.






















































Expires: June 2007                                              [Page 8]