draft-ietf-tcpm-sack-recovery-entry-01

Internet Engineering Task Force                              I. Jarvinen
INTERNET-DRAFT                                                   M. Kojo
draft-ietf-tcpm-sack-recovery-entry-01.txt        University of Helsinki
Intended status: Standards Track                            8 March 2010
Expires: September 2010



  Using TCP Selective Acknowledgement (SACK) Information to Determine
        Duplicate Acknowledgements for Loss Recovery Initiation


Status of this Memo

    This Internet-Draft is submitted to IETF in full conformance with
    the provisions of BCP 78 and BCP 79.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt.

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.

    This Internet-Draft will expire on September 2010.

Copyright Notice

    Copyright (c) 2010 IETF Trust and the persons identified as the
    document authors.  All rights reserved.

    This document is subject to BCP 78 and the IETF Trust's Legal
    Provisions Relating to IETF Documents
    (http://trustee.ietf.org/license-info) in effect on the date of
    publication of this document. Please review these documents
    carefully, as they describe your rights and restrictions with



Jarvinen/Kojo                                                   [Page 1]


INTERNET-DRAFT           Expires: September 2010              March 2010


    respect to this document. Code Components extracted from this
    document must include Simplified BSD License text as described in
    Section 4.e of the Trust Legal Provisions and are provided without
    warranty as described in the Simplified BSD License.

Abstract

    This document describes a TCP sender algorithm to trigger loss
    recovery based on the TCP Selective Acknowledgement (SACK)
    information gathered on a SACK scoreboard instead of simply counting
    the number of arriving duplicate acknowledgements (ACKs) in the
    traditional way.  The given algorithm is more robust to ACK losses,
    ACK reordering, missed duplicate acknowledgements due to delayed
    acknowledgements, and extra duplicate acknowledgements due to
    duplicated segments and out-of-window segments. The algorithm allows
    not only a timely initiation of TCP loss recovery but also reduces
    false fast retransmits.  It has a low implementation cost on top of
    the SACK scoreboard defined in RFC 3517.

































Jarvinen/Kojo                                                   [Page 2]


INTERNET-DRAFT           Expires: September 2010              March 2010


                             Table of Contents

    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . .   5
       1.1. Conventions and Terminology. . . . . . . . . . . . . . .   6
       1.2. Definitions. . . . . . . . . . . . . . . . . . . . . . .   7
    2. Algorithm Details . . . . . . . . . . . . . . . . . . . . . .   7
       2.1. Redefined IsLost (SeqNum). . . . . . . . . . . . . . . .   7
       2.2. The Algorithm. . . . . . . . . . . . . . . . . . . . . .   7
    3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . .   9
       3.1. Small Segment Sender . . . . . . . . . . . . . . . . . .   9
       3.2. SACK Capability Misbehavior. . . . . . . . . . . . . . .  10
       3.3. Compatibility with Duplicate ACK based Loss
       Recovery Algorithms . . . . . . . . . . . . . . . . . . . . .  11
    4. Security Considerations . . . . . . . . . . . . . . . . . . .  11
    5. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
    6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . .  12
    Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12
    A. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . .  12
       A.1. Basic Case . . . . . . . . . . . . . . . . . . . . . . .  12
       A.2. Delayed ACK. . . . . . . . . . . . . . . . . . . . . . .  13
       A.3. ACK Loss . . . . . . . . . . . . . . . . . . . . . . . .  14
       A.4. ACK Reordering . . . . . . . . . . . . . . . . . . . . .  15
       A.5. Duplicated Packet. . . . . . . . . . . . . . . . . . . .  16
       A.6. Mitigation of Blind Throughput Reduction
       Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
    References . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
    Normative References . . . . . . . . . . . . . . . . . . . . . .  16
    Informative References . . . . . . . . . . . . . . . . . . . . .  17
    AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . .  18






















Jarvinen/Kojo                                                   [Page 3]


INTERNET-DRAFT           Expires: September 2010              March 2010


    TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION:

    Changes from draft-ietf-tcpm-sack-recovery-entry-00.txt

    * Mention setting of RecoveryPoint explicitly as this algorithm
    depends on it being valid.

    * Changed definition of IsLost (SeqNum) to be less strict.

    * Changed packet ordering in one of the appendix examples, now it
    makes more sense in the context of this algorithm.  Point out in the
    examples which of the transmissions are due to Limited Transmit and
    Fast retransmit.

    Changes from draft-jarvinen-tcpm-sack-recovery-entry-01.txt

    * Clarified issues that based on feedback may cause confusion for
    the reader.

    * Incorporated handling of cumulative ACKs into the algorithm

    * 2581 refs -> 5681

    * Added early-rexmt ID as a related one, it uses SACK information
    similar to this algorithm (Thanks to Anna Brunstrom).

    * More cases added where this algorithm is beneficial in taking
    advantage of SACK block redundancy (thanks to Anna Brunstrom).

    * Discuss on differences how duplicate ACK counter is managed
    (traditional vs. this algorithm)

    * Added ref and couple of words about blind throughput reduction
    attack

    * Wrote SACK splitting attacks. These attacks are quite close to the
    edge in significance. Should consider just dropping (rather
    insignificant).

    Changes from draft-jarvinen-tcpm-sack-recovery-entry-00.txt

    * TODO items embedded: Improvements with window update, clarify
    dupack counting

    * Modified ACK reordering scenario in appendix, shows now a scenario
    where recovery is triggered in a more timely manner.

    * IDnits



Jarvinen/Kojo                                                   [Page 4]


INTERNET-DRAFT           Expires: September 2010              March 2010


    * Handle small segments case using duplicate ACKs counter paraller
    to the SACK blocks based detection.

    * Add a placeholder for SACK splitting

    * Mentioned FACK as some ideas are inherited from there

    END OF SECTION TO BE DELETED.


1.  Introduction

    The Transmission Control Protocol (TCP) [RFC793] has two methods for
    triggering retransmissions.  First, the TCP sender relies on
    incoming duplicate acknowledgements (ACKs) [RFC5681], indicating
    receipt of out-of-order segments at the TCP receiver. After
    receiving a required number of duplicate ACKs (usually three), the
    TCP sender retransmits the first unacknowledged segment and
    continues with a fast recovery algorithm such as Reno [RFC5681],
    NewReno [RFC3782] or SACK-based loss recovery [RFC3517].  Second,
    the TCP sender maintains a retransmission timer that triggers
    retransmission of segments, if the retransmission timer expires
    before the segments have been acknowledged.

    While the conservative loss recovery algorithm defined in [RFC3517]
    takes full advantage of SACK information during a loss recovery, it
    does not consider the very same information during the pre-recovery
    detection phase. Instead, it simply counts the number of arriving
    duplicate ACKs and leans on the number of duplicate ACKs in deciding
    when to enter loss recovery. However, this traditional heuristics of
    simply counting the number of duplicate ACKs to trigger a loss
    recovery fails in several cases to determine correctly the actual
    number of valid out-of-order segments the receiver has successfully
    received.  First, trusting on duplicate ACKs alone utterly fails to
    get hold of the whole picture in case of ACK losses and ACK
    reordering, resulting in delayed or missed initiation of fast
    retransmit and fast recovery. Similarly, the delayed ACK mechanism
    tends to conceal the first duplicate ACK as the delayed cumulative
    ACK becomes combined with the first duplicate ACK when the first
    out-of-order segment arrives at the receiver (in case of an enlarged
    ACK ratio such as with ACK congestion control [RFC5690], even more
    significant portion is affected).  Second, segment duplication or
    out-of-window segments increase the risk of falsely triggering loss
    recovery as they trigger duplicate ACKs. At worst, this legitimate
    behavior on out-of-window segments can be turned into a blind
    throughput reduction attack [CPNI09].  Third, receiver window
    updates or opposite direction data segments cannot be counted as
    duplicate ACKs with the traditional approach but can still contain



Jarvinen/Kojo                                       Section 1.  [Page 5]


INTERNET-DRAFT           Expires: September 2010              March 2010


    redundant SACK information that the sender could benefit from in a
    scenario where the actual duplicate ACKs where lost.

    The algorithm specified in this document uses TCP Selective
    Acknowledgement Option [RFC2018] in the pre-recovery state to
    determine duplicate ACKs and to trigger loss recovery based on the
    information gathered on the SACK scoreboard [RFC3517].  It gives a
    more accurate heuristic for determining the number of out-of-order
    segments that have arrived at the TCP receiver.  The information
    gathered on the SACK scoreboard reveals missing ACKs and allows
    detecting duplicate events. Therefore, the algorithm enables a
    timely triggering of Fast Retransmit. In addition, it allows the use
    of Limited Transmit [RFC3042] accurately regardless of lost ACKs and
    also in the cases where the SACK information is piggybacked to a
    cumulative ACK due to delayed ACKs.  This, in turn, improves the ACK
    clock accuracy.

    This algorithm is close to what Linux TCP implementation has used
    for a very long time when in conservative SACK mode. A similar
    approach is briefly mentioned along ACK congestion control [RFC5690]
    but as the usefulness of the algorithm in this document is more
    general and not limited to ACK congestion control we specify it
    separately. We also note that the definition of a duplicate
    acknowledgement already suggests that an incoming ACK can be
    considered as a duplicate ACK if it "contains previously unknown
    SACK information" [RFC5681]. In addition, SACK information is used,
    whenever available, for similar purpose by Early Retransmit
    [AAA+10].

    This algorithm also resembles Forward Acknowledgement (FACK) [MM96]
    but they differ in how the quantity of data outstanding in the
    network is determined. FACK always assumes that every non-SACKed
    octet below the highest SACKed octet is lost which is only true if
    no reordering occurs. Thus it would simply trigger loss recovery
    whenever the highest SACKed octet is more than dupThresh * SMSS
    octets above SND.UNA.


1.1.  Conventions and Terminology

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in BCP 14, RFC 2119
    [RFC2119] and indicate requirement levels for protocols.







Jarvinen/Kojo                                     Section 1.1.  [Page 6]


INTERNET-DRAFT           Expires: September 2010              March 2010


1.2.  Definitions

    The reader is expected to be familiar with the definitions given in
    [RFC5681], [RFC2018], and [RFC3517].


2.  Algorithm Details

    In order to use this algorithm, a TCP sender MUST have TCP Selective
    Acknowledgement Option [RFC2018] enabled and negotiated for the TCP
    connection. The TCP sender MUST maintain SACK information in an
    appropriate data structure such as scoreboard defined in [RFC3517].
    This algorithm uses functions Update(), and SetPipe () and variables
    DupThresh, HighData, HighRxt, Pipe, and RecoveryPoint, as defined in
    [RFC3517]. Note: the definition of IsLost (SeqNum) is altered from
    the one specified in [RFC3517].


2.1.  Redefined IsLost (SeqNum)

    IsLost (SeqNum) defined in [RFC3517] is stricter than necessary in
    counting how many segments the receiver has received past SeqNum.
    Instead of requiring at least three times SMSS bytes to be SACKed,
    it is enough to have at least two times SMSS bytes plus one byte
    SACKed to confirm that the receiver has received at least three
    segments above SeqNum (and would have generated at least three
    duplicate ACKs). The less strict definition is:

    IsLost (SeqNum):

        This routine returns whether the given sequence number is
        considered to be lost.  The routine returns true when either
        DupThresh discontiguous SACKed sequences have arrived above
        'SeqNum' or more than (DupThresh - 1) * SMSS bytes with sequence
        numbers greater than 'SeqNum' have been SACKed.  Otherwise, the
        routine returns false.


2.2.  The Algorithm

    A TCP sender using this algorithm MUST take the following steps upon
    the receipt of any ACK containing SACK information:

    1)  If no previous loss event has occurred on the connection OR
        RecoveryPoint is less than SND.UNA (the oldest unacknowledged
        sequence number [RFC793]), continue with the other steps of
        this algorithm. Otherwise, continue the ongoing loss recovery.




Jarvinen/Kojo                                     Section 2.2.  [Page 7]


INTERNET-DRAFT           Expires: September 2010              March 2010


    2)  Update the scoreboard via the Update () function as outlined
        in [RFC3517].

    3)  If ACK is a cumulative ACK, reset duplicate ACK counter to zero.

    4)  If ACK contains SACK blocks with previously unknown in-window
        SACK information (i.e., between SND.UNA and HighData, assuming
        SND.UNA has been updated from the acknowledgment number of the
        ACK), increase duplicate ACK counter.

    5)  Determinate if a loss recovery should be initiated:

        If IsLost (SND.UNA) returns false AND the sender has received
        less than DupThresh duplicate ACKs, goto step 6A. Otherwise goto
        step 6B.

    6A) Invoke optional Limited Transmit:

        Set HighRxt to SND.UNA and run SetPipe(). The TCP sender MAY
        transmit previously unsent data segments according the
        guidelines of Limited Transmit [RFC3042], with the exception
        that the amount of octets that can be send is determined by Pipe
        and cwnd.

        If cwnd - Pipe >= 1 SMSS, the TCP sender can transmit one or
        more segments as follows:

        Send Loop:

        a) If available unsent data exists and the receiver's advertised
           window allows, transmit one segment of up to SMSS octets of
           previously unsent data starting with sequence number
           HighData+1 and update HighData to reflect the transmission of
           the data segment. Otherwise, exit Send Loop.

        b) Run SetPipe() to re-calculate the number of outstanding
           octets in the network. If cwnd - Pipe >= 1 SMSS, go to step
           a) of Send Loop.  Otherwise, exit Send Loop.

    6B) Invoke Fast Retransmit and enter loss recovery:

        Initiate a loss recovery phase, per the fast retransmit
        algorithm outlined in [RFC5681], and continue with a fast
        recovery algorithm such as the SACK-based loss recovery
        algorithm outlined in [RFC3517].  This includes setting
        RecoveryPoint to HighData as in step (1) of [RFC3517].





Jarvinen/Kojo                                     Section 2.2.  [Page 8]


INTERNET-DRAFT           Expires: September 2010              March 2010


3.  Discussion

    In scenarios where no ACK losses nor reordering occur and the first
    acknowledgement with SACK information is not the ACK held due to
    delayed acknowledgements mechanism, the new SACK information with
    each duplicate ACK covers a single segment. Those duplicate ACKs
    cause this algorithm to trigger loss recovery after three duplicate
    acknowledgements and will allow transmission of new segments using
    Limited Transmit on the first and second duplicate ACK. This is
    identical to the behavior that would occur without this algorithm
    (assuming DupThresh is 3 and that all segments are SMSS sized). This
    scenario together with other typical scenarios describing the
    behavior of the algorithm are depicted in Appendix A.

    This algorithm SHOULD be used also with an ACK that contains a
    window update or opposite direction data that could not be
    considered as a duplicate ACK in the traditional algorithm. Such
    behavior is safe because the SACK information can only add more
    information to the current state of the sender; at worst, all
    received information is just redundant.

    Setting HighRxt to SND.UNA in Step 6A has no direct relation to this
    algorithm. Yet it is included in the algorithm to avoid confusion in
    how to implement SetPipe() correctly because it depends on having a
    valid HighRxt value [RFC3517].

    A set of potential issues to consider with the algorithm are
    discussed in the following.


3.1.  Small Segment Sender

    If a TCP sender is sending small segments (usually intentionally
    overriding Nagle algorithm [RFC896]), the IsLost (SND.UNA) used in
    step 5 of the algorithm might fail to detect the need for loss
    recovery on the third duplicate acknowledgement because not enough
    octets have been SACKed to cover more than (DupThresh - 1) * SMSS
    bytes above SND.UNA.  Therefore, an adapted duplicate ACK algorithm
    is needed as a fallback. Steps 3, 4 and the latter condition of step
    5 implement the adapted duplicate ACK algorithm in parallel to the
    SACK block based detection.

    The number of duplicate ACKs is an artificial metric to estimate the
    number of segments the receiver has already in its receive buffer.
    How accurately they match depends on the scenario. Because of that,
    the goal of the duplicate ACK counter included into this algorithm
    is not to achieve bug-to-bug compatibility with the plain duplicate
    ACK counter but to estimate how many out-of-order segments the



Jarvinen/Kojo                                     Section 3.1.  [Page 9]


INTERNET-DRAFT           Expires: September 2010              March 2010


    receiver has already queued in a more accurate way. Therefore, the
    duplicate ACK counter used as a fallback mechanism in this algorithm
    differs from the plain duplicate ACK counter. However, such
    differences indicate a scenario where the plain counter was not able
    to accurately keep track of the receiver state.

    While the fallback algorithm itself does not look into
    acknowledgment field in order to make a decision whether ACK is a
    "duplicate ACK", the duplicate ACK counter is not renamed in this
    document as in practice most of ACKs that increment the counter
    would still contain a duplicate acknowledgment number.  In contrast
    to the traditional approach, only condition that must be satisfied
    to increment the duplicate ACK counter with this algorithm is that
    the acknowledgement MUST contain at least one in-window SACK block
    that covers octets that were not previously SACKed [RFC5681]. In
    cases with ACK losses or delayed ACKs this condition can also match
    to cumulative ACKs, receiver window updates and opposite direction
    data segments but still the counter can safely be incremented.

    Alternatively to the fallback algorithm, a TCP sender that is able
    to discern segment boundaries accurately can consider full segments
    in IsLost (SeqNum) regardless of segment size.  Therefore, such a
    TCP sender can avoid the problem with small segments using IsLost
    (SND.UNA) check alone which means that Steps 3, 4 and the latter
    condition of step 5 are redundant and not required to be
    implemented.

    Note: the small segments problem is not unique to this algorithm but
    also the SACK-based loss recovery [RFC3517] encounters it because of
    how IsLost (SeqNum) is defined.



3.2.  SACK Capability Misbehavior

    If the receiver represents such a SACK misbehavior that it
    advertises SACK capability but never sends any SACK blocks when it
    should, this algorithm fails to enter loss recovery and
    retransmission timeout is required for recovery. However, such
    misbehavior does not allow SACK-based loss recovery [RFC3517] to
    work either, and a TCP sender will anyway require a timeout to
    recover if there was more than one lost data segment within the
    window.








Jarvinen/Kojo                                    Section 3.2.  [Page 10]


INTERNET-DRAFT           Expires: September 2010              March 2010


3.3.  Compatibility with Duplicate ACK based Loss Recovery Algorithms

    This algorithm SHOULD NOT be used together with a fast recovery
    algorithm that determines the segments that have left the network
    based on the number of arriving duplicate acknowledgements (e.g.,
    NewReno [RFC3782]), instead of the actual segments reported by SACK.
    In presence of ACK reordering such an algorithm will count the
    delayed duplicate acknowledgements during the fast recovery
    algorithm as extra while determining the number of packets that have
    left the network.

    In general there should be very little reason to combine this
    algorithm with a loss recovery algorithm that is based on inferior,
    non-SACK based information only.


4.  Security Considerations

    A malicious TCP receiver may send false SACK information for
    sequence number ranges which it has not received in order to trigger
    Fast Retransmit sooner. Such behavior would only be useful when out-
    of-order segments have arrived because otherwise the flow undergoes
    a loss recovery with a window reduction. This kind of lying involves
    guessing which segments will arrive later. In case the guess was
    wrong, the performance of the flow is ruined because the TCP sender
    will need a retransmission timeout as it will not retransmit the
    segments until it assumes SACK reneging. On a successful guess the
    attacker is able to trigger the recovery slightly earlier. The later
    segments would have allowed reporting the very same regions with
    SACK anyway. Therefore, the gain from this attack is small, hardly
    justifiable considering the drastic effect of a misguess.
    Furthermore, a similar attack can be made with the duplicate
    acknowledgment based algorithm (even if the new SACK information
    rule is applied) by sending false duplicate acknowledgements with
    false SACK ranges, and trivially without the new SACK information
    rule.

    A variation of the lying attack discards reliability of the flow but
    as soon as the reliability is not a concern of the receiver, a
    number of simpler ways exist to attack TCP independently of this
    algorithm. Thus this algorithm is not considered to weaken TCP
    security properties against false information.

    Splitting SACK blocks into a smaller than the received segment sized
    chunks allows the receiver to enable recovery to start sooner
    because of IsLost (SeqNum) discontiguous check. However, by doing so
    the receiver neglects the possiblity of reordering for a little
    gain. If the segment was just reordered, the sender performs



Jarvinen/Kojo                                      Section 4.  [Page 11]


INTERNET-DRAFT           Expires: September 2010              March 2010


    unnecessary window reduction and unnecessary retransmission of the
    reordered segment. Another variant of SACK block splitting simply
    tries to increase consumption of bandwidth by triggering a burst of
    retransmissions falsely. However,  the difference between sending
    three duplicate ACKs (traditional algorithm) and a single ACK with
    SACK blocks will not offer significant benefits to make such an
    attack practical with a small DupThresh value such as three.  In
    case the sender keeps track of segment boundaries and applies them
    in IsLost (SeqNum), such attack will not succeed as the sender
    cannot be mislead to believe that a segment was split into multiple
    chunks.


5.  IANA Considerations

    This document has no actions for IANA.


6.  Acknowledgements

    The authors would like to thank Alexander Zimmermann and Anna
    Brunstrom for the comments on this document.


Appendix


A.  Scenarios


A.1.  Basic Case

    In this scenario no Delayed ACK, ACK losses, reordering or other
    "abnormal" behavior happens. For simplicity all the segments are
    SMSS sized.

    Once the TCP receiver gets first out-of-order segment, it sends a
    duplicate ACK with SACK information about the received octets. The
    following two out-of-order segments trigger a duplicate ACK each,
    with the corresponding range SACKed in addition to the previously
    know information. The sender gets those duplicate ACKs in-order,
    each of them will SACK a new previously unknown segment.

    This algorithm triggers loss recovery on third duplicate ACK because
    IsLost (SeqNum) returns true as more than (DupThresh - 1) * SMSS
    bytes become SACKed on the same acknowledgement, thus the behavior
    is identical to that of a sender which is using duplicate
    acknowledgments.  If Limited Transmit is in use, two first duplicate



Jarvinen/Kojo                                    Section A.1.  [Page 12]


INTERNET-DRAFT           Expires: September 2010              March 2010


    ACKs allow a single segment to be sent with either of the algorithms
    (Pipe is decremented by SMSS by the SACKed octets per ACK allowing
    SMSS worth of new octets).

        ACK           Transmitted    Received    ACK Sent
        Received      Segment        Segment     (Including SACK Blocks)

        1000
                      3000-3499      3000-3499   (delayed ACK)
                      3500-3999      3500-3999   4000
        2000
                      4000-4499      (dropped)
                      4500-4999      4500-4999   4000, SACK=4500-5000
        3000
                      5000-5499      5000-5499   4000, SACK=4500-5500
                      5500-5999      5500-5999   4000, SACK=4500-6000
        4000
                      6000-6499      6000-6499   4000, SACK=4500-6500
                      6500-6999      6500-6999   4000, SACK=4500-7000
        4000, SACK=4500-5000
         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
        4000, SACK=4500-5500
         (lim. tr.)   7500-7999      7500-7999   4000, SACK=4500-8000
        4000, SACK=4500-6000
         (fast retr.) 4000-4499      4000-4499   8000
        4000, SACK=4500-6500


A.2.  Delayed ACK

    The case with delayed ACK occurs when the receiver sends the first
    ACK with SACK information but since the previous ACK was sent with a
    lower sequence number because an acknowledgment is held by delayed
    ACK, the sender will not considered it as duplicate ACK. Because the
    segment contains SACK information that is identical to the basic
    case, the sender can use Limited Transmit with the same segments as
    in the basic case and will start loss recovery at the third
    acknowledgment, i.e., with the second duplicate acknowledgment. In
    the same situation the duplicate ACK based sender will have to wait
    for one more duplicate ACK to arrive to do the same as the first
    acknowledgment is fully "wasted".

    Technically an acknowledgement with a sequence number higher than
    what was previously acknowledged is not a duplicate acknowledgement
    but a presence of the SACK block tells another story revealing the
    receiver which used delayed ACK, and thus the missing duplicate
    acknowledgement in between. The response of a TCP sender taking
    advantage of such inferred duplicate acknowledgements is well within



Jarvinen/Kojo                                    Section A.2.  [Page 13]


INTERNET-DRAFT           Expires: September 2010              March 2010


    the guidelines of packet conservation principle [Jac88] as it still
    sends only when segments have left the network.

        ACK           Transmitted    Received    ACK Sent
        Received      Segment        Segment     (Including SACK Blocks)

        1500
                      3000-3499      3000-3499   3500
                      3500-3999      3500-3999   (delayed ACK)
        2500
                      4000-4499      (dropped)
                      4500-4999      4500-4999   4000, SACK=4500-5000
        3500
                      5000-5499      5000-5499   4000, SACK=4500-5500
                      5500-5999      5500-5999   4000, SACK=4500-6000
        4000, SACK=4500-5000 (two segments left the network)
                      6000-6499      6000-6499   4000, SACK=4500-6500
         (lim. tr.)   6500-6999      6500-6999   4000, SACK=4500-7000
        4000, SACK=4500-5500
         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
        4000, SACK=4500-6000
         (fast retr.) 4000-4499      4000-4499   7500
        4000, SACK=4500-6500


A.3.  ACK Loss

    This case with ACK loss shares much behavior with the case with
    delayed ACK. If hole at RCV.NXT is filled, the sender will notice
    that cumulative ACK advanced.  In case of out-of-order segments the
    first ACK which gets through to the sender includes SACK blocks up
    to the quantity the SACK block redundancy is able to cover.  With
    this algorithm the sender immediately takes use of all the
    information that is made available by the incoming ACK.

        ACK           Transmitted    Received    ACK Sent
        Received      Segment        Segment     (Including SACK Blocks)

        1000
                      3000-3499      3000-3499   (delayed ACK)
                      3500-3999      3500-3999   4000
        2000
                      4000-4499      (dropped)
                      4500-4999      4500-4999   4000, SACK=4500-5000
                                                 (dropped)
        3000
                      5000-5499      5000-5499   4000, SACK=4500-5500
                      5500-5999      5500-5999   4000, SACK=4500-6000



Jarvinen/Kojo                                    Section A.3.  [Page 14]


INTERNET-DRAFT           Expires: September 2010              March 2010


        4000
                      6000-6499      6000-6499   4000, SACK=4500-6500
                      6500-6999      6500-6999   4000, SACK=4500-7000
        4000, SACK=4500-5500 (two segments left the network)
         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
         (lim. tr.)   7500-7999      7500-7999   4000, SACK=4500-8000
        4000, SACK=4500-6000
         (fast retr.) 4000-4499      4000-4499   8000
        4000, SACK=4500-6500


A.4.  ACK Reordering

    With ACK reordering an ACK is postponed.  Due to redundancy the next
    ACK after postponed one contains not only its own information but
    also the information of the reordered ACK (similar to the ACK losses
    case).  When the reordered ACK arrives later, the sender already
    knows the information it provides and therefore no actions are taken
    with this algorithm.

        ACK           Transmitted    Received    ACK Sent
        Received      Segment        Segment     (Including SACK Blocks)

        1000
                      3000-3499      3000-3499   (delayed ACK)
                      3500-3999      3500-3999   4000
        2000
                      4000-4499      (dropped)
                      4500-4999      4500-4999   4000, SACK=4500-5000
                                                 (delayed)
        3000
                      5000-5499      5000-5499   4000, SACK=4500-5500
                      5500-5999      5500-5999   4000, SACK=4500-6000
        4000
                      6000-6499      6000-6499   4000, SACK=4500-6500
                      6500-6999      6500-6999   4000, SACK=4500-7000
        4000, SACK=4500-5500 (two segments left the network)
         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
         (lim. tr.)   7500-7999      7500-7999   4000, SACK=4500-8000
        4000, SACK=4500-5000 (has only redundant information)
        4000, SACK=4500-6000
         (fast retr.) 4000-4499      4000-4499   8000
        4000, SACK=4500-6500








Jarvinen/Kojo                                    Section A.4.  [Page 15]


INTERNET-DRAFT           Expires: September 2010              March 2010


A.5.  Duplicated Packet

    A duplicate packet is received either due to unnecessary
    retransmission or hardware duplication.  It adds a redundant ACK
    which has only redundant information or a data segment to the stream
    which will trigger a redundant duplicate ACK (possibly with SACK
    and/or DSACK [RFC2883] information).  Because neither adds any new
    SACKed octets at the TCP sender, this algorithm will not do anything
    whereas a duplicate ACK based receiver would falsely consider it as
    a duplicate ACK.

    If one of the redundant ACKs is lost, the effect of duplication is
    just cancelled.

    It would be possible for the sender to detect this case using DSACK
    alone.


A.6.  Mitigation of Blind Throughput Reduction Attack

    In case an attacker knows or is able to guess 4-tuple of a TCP
    connection, it may apply a blind throughput reduction attack
    [CPNI09].  In this attack TCP is tricked to send duplicate ACKs to
    the other endpoint using segments likely residing out-of-window that
    is considerably easier to achieve than a match with sequence
    numbers. If more than dupThresh duplicate ACKs can be triggered in a
    row without any legimate segment that advances acknowledged sequence
    number, the other end acts according to the false congestion signal
    and halves the window.

    With this algorithm such duplicate ACKs are filtered because they do
    not have any new in-window SACK blocks (DSACK [RFC2883] might be
    present though, but it does not cover in-window octets).


References


Normative References


    [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
              793, September 1981.

    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow,
              "TCP Selective Acknowledgment Options", RFC 2018,
              October 1996.




Jarvinen/Kojo                                                  [Page 16]


INTERNET-DRAFT           Expires: September 2010              March 2010


    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

    [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
              January 2001.

    [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang,
              "A Conservative Selective Acknowledgment (SACK)-based
              Loss Recovery Algorithm for TCP", RFC 3517, April 2003.

    [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, September 2009.


Informative References

    [AAA+10]  Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J.,
              and P. Hurtig, "Early Retransmit for TCP and SCTP",
              Internet-Draft, draft-ietf-tcpm-early-rexmt-04, January
              2010.

    [CPNI09]  Security Assessment of the Transmission Control Protocol
              (TCP).  Available at:
              http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-
              TCP.pdf

    [Jac88]   Jacobson, V., "Congestion Avoidance and Control", In
              Proceedings of ACM SIGCOMM '88, August 1988.

    [MM96]    M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining
              TCP Congestion Control," In Proceedings of SIGCOMM '96,
              August 1996.

    [RFC896]  Nagle, J., "Congestion Control in IP/TCP Internetworks",
              RFC 896, January 1984.

    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
              Extension to the Selective Acknowledgement (SACK) Option
              for TCP", RFC 2883, July 2000.

    [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
              April 2004.

    [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
              Acknowledgement Congestion Control to TCP", RFC 5690,
              February 2010.



Jarvinen/Kojo                                                  [Page 17]


INTERNET-DRAFT           Expires: September 2010              March 2010


AUTHORS' ADDRESSES


    Ilpo Jarvinen
    University of Helsinki
    P.O. Box 68
    FI-00014 UNIVERSITY OF HELSINKI
    Finland
    Email: ilpo.jarvinen@helsinki.fi

    Markku Kojo
    University of Helsinki
    P.O. Box 68
    FI-00014 UNIVERSITY OF HELSINKI
    Finland
    Email: kojo@cs.helsinki.fi



































Jarvinen/Kojo                                                  [Page 18]

Document	Document type	Expired Internet-Draft (tcpm WG) Expired & archived
	Select version	00 01
	Compare versions
	Authors	Ilpo Järvinen , Markku Kojo Email authors
	Replaces	draft-jarvinen-tcpm-sack-recovery-entry
	RFC stream
	Intended RFC status	(None)
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion