Internet Engineering Task Force                       Sumitha Bhandarkar
INTERNET DRAFT                                     A. L. Narasimha Reddy
draft-ietf-tcpm-tcp-dcr-01.txt                      Texas A&M University
Expires : February 2005                                      August 2004



       Improving the robustness of TCP to Non-Congestion Events.


Status of this Memo


   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract:


   This document proposes TCP-DCR, a simple modification to the TCP
   congestion control algorithm to make it more robust to non-congestion
   events. In the absence of explicit notification from the network, the
   TCP congestion control algorithm treats the receipt of three
   duplicate acknowledgements as an indication of congestion in the
   network. This is not always correct, notably so in wireless networks
   with channel errors or networks prone to excessive packet reordering,
   resulting in degraded performance. TCP-DCR aims to remedy this by
   delaying the congestion response of TCP for a short interval of time
   tau, thereby creating room to handle any non-congestion events that
   may have occurred. If at the end of the delay tau, the event is not
   handled, then it is treated as a congestion loss. The modifications
   themselves do not handle the non-congestion event, but rather rely on
   some underlying mechanism to do this. This document discusses the
   implications of delaying congestion response on the fairness, TCP-
   compatibility and network dynamics, and the benefits to be gained by
   applying the TCP-DCR modifications to TCP.




Bhandarkar/Reddy         Expires February 2005                  [Page 1]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



1. Introduction


   In the absence of explicit notification from the network, the TCP
   sender treats the receipt of three duplicate acknowledgements
   (dupacks, for short) as an indication of congestion in the network.
   It responds by triggering the fast retransmit/fast recovery
   algorithm, where the packet perceived to be lost is retransmitted and
   the congestion window is reduced by half to relieve the congestion in
   the network. When the reason for the generation of dupacks is not
   congestion related, this reduction of the congestion window results
   in sub-optimal performance.


   The two chief non-congestion events that might cause the generation
   of dupacks considered in this document are channel errors in wireless
   networks and excessive packet reordering. Several different solutions
   have been proposed in literature to improve the performance of TCP in
   the presence of channel errors
   [BB95,BPSK97,BS97,BSAK95,CLM99,MCGSW01,SVSB99,VMPM02,WT98,YB94] or
   packet reordering[BA02,ZKFP02]. This document proposes TCP-DCR which
   is a simple and unified solution to improve the robustness of TCP to
   any non-congestion event. Even though the discussion here is focussed
   on the two chief causes mentioned above, the solution is general
   enough to be extended to other non-congestion events resulting in the
   generation of dupacks.


   Throughout the rest of this document, the term "TCP-DCR" is used to
   refer to the modifications that need to be made to TCP to make it
   robust to non-congestion events as well as to refer to the TCP flavor
   to which the modifications have been applied.


2. Problem Description


   The strength of TCP lies in its ability to adjust its sending rate
   according to the perceived congestion in the network. In the absence
   of explicit notification of congestion from the network, the
   traditional TCP flavors use the loss of a packet as an indication of
   congestion. In order to help the sender identify a lost packet the
   receiver sends acknowledgements for every packet received in-order
   and duplicate acknowledgements (dupacks) for every packet received
   out-of-order. The acks were specified originally in order to clock
   out new packets. The use of three dupacks as an indication of
   congestion was added later. When the sender receives three
   consecutive dupacks, it concludes that the packet is lost due to
   congestion.


   The TCP sender does not respond to the very first dupack, but waits
   for three dupacks to allow for a mildly reordered packet to reach the
   receiver, and possibly result in a cumulative acknowledgement.




Bhandarkar/Reddy         Expires February 2005                  [Page 2]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   Limited Transmit, which is now Proposed Standard, allows the sender
   to send new packets in response to the first and second dupacks. The
   choice of waiting for three dupacks is purely heuristic. When the
   network is responsible for non-negligible amounts of non-congestion
   events, this trigger of three dupacks tends to be short and drastic.
   The persistent occurance of non-congestion events causes the TCP
   sender window to oscillate around a smaller value than what is
   actually allowed by the congestion in the network, resulting in
   degraded performance.


   It is interesting at this point to review the prevalence of non-
   congestion events on the Internet. The two chief causes that are
   identified and targeted in this document are - wireless channel
   errors, and packet reordering within the network. While the existence
   of channel errors in the wireless networks is a well accepted fact,
   there is a general perception that packet reordering within the
   Internet is a rare phenomenon. Several recent measurement studies
   [BPS99,JIDKT03] though have shown results contrary to this popular
   sentiment. Even if we were to suppose that the amount of packet
   reordering in the current Internet is negligibly small, the need for
   almost in-order packet delivery places a severe constraint on the
   design of novel routing algorithms, network components and
   applications. For instance, high speed packet switches could cause
   resequencing of packets and there has been work proposed in the
   literature to ensure that packet ordering is maintained in such
   switches [KM02]. Other examples are multi-path routing, high-delay
   satellite links and some of the schemes proposed for differentiated
   services architecture. By making TCP more robust to non-congestion
   events, we aim to ease this restriction of always in-order delivery
   on the design of the future Internet components.


3. Design Guidelines


   The proposal for TCP-DCR in this document is motivated by the
   following requirements -


   * Improve the robustness of TCP to non-congestion events in general,
   rather than on a case-by-case basis.


   * Maintain the end-to-end TCP semantics.


   * Require a minimal amount of modification to the network
   infrastructure.


   * The solution should lend itself to incremental deployment.


   * After the modifications, the protocol should remain compatible with
   existing flavors of TCP.




Bhandarkar/Reddy         Expires February 2005                  [Page 3]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



4. Modifications to TCP


   The TCP-DCR modifications involve simple changes regarding when the
   fast retransmit/recovery algorithms should be triggered. The current
   TCP flavors wait for three dupacks before responding as if a packet
   is lost due to congestion. This document extends the concept further
   by allowing the TCP-DCR sender to wait for an interval of tau after
   receiving the first dupack before responding to it as if it were a
   packet lost due to congestion. During the period tau, the TCP sender
   sends one new packet for every incoming dupack, if the congestion
   window allows it, similar to what is proposed by the Limited Transmit
   algorithm [ABF01]. The sender also continues to increase the
   congestion window during this period. However, since only one packet
   is allowed to be sent in response to each dupack, the number of
   packets on the link at any point remains the same as (or less than)
   the number of packets on the link when the first dupack was received.


   The following figure illustrates the behavior of TCP in the presence
   of packet reordering, when the TCP-DCR modifications are applied.


                                           |<-------- tau -------->|
                                           Cong Response Delay Timer
                                      Limited Transmit/Additive Increase


          No Retransmission/Window Reduction ----+
                                                 |
                   Set Cong Response ------+     |   Cong Resp Delay
                       Delay Timer         |     |   Timer Cancelled
                                           |     |             |
           | <-- Round Trip Time --> |     v     v             |
                                                               |
           1  2  3  4  5  6          7     8  9  10  11        v
Sender  ---,--,--,--,--,--,----------,-----,--,--,--,----------,-------
            \  *  \  \  \  \        / \   / \/ \/ \/ \        /
             \    *\  \  \  \      /     /  /  /  /          /
              \     \* \  \  \    /        /                /
               \     \  *  \  \  /     /  /  /  /          /
                \     \  \ *\  \/     /  /  /  /          /
                 \     \  \  \*/\    /  /  /  /          /
                  \     \  \  \  *  /  /  /  /          /
                   \     \  \/ \  \/* /  /  /          /
                    \     \ /\  \ /\ / */  /          /
                     \     \  \  \  \  /  *          /
                      \   / \  \/ \/ \/  /   *      /
                       \ /   \ /\ /\ /\ /       *  /
Rcvr    ----------------`-----`--`--`--`----------*--------------------
                        2     2  2  2  2           8
    Figure 1: Behavior of TCP-DCR in the presence of packet reordering.




Bhandarkar/Reddy         Expires February 2005                  [Page 4]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   As it can be seen from the figure, when the first dupack is received,
   the congestion response delay timer is set. When three dupacks are
   received, if the congestion response delay timer has not expired, the
   fast retransmit/recovery algorithm is not triggered. If the
   acknowledgement for the reordered packet reaches the sender before
   the delay timer expires, then the timer is cancelled and the sender
   does not suffer unnecessary reduction in the sending rate.


   The following figure illustrates the behavior of TCP in the presence
   of packet loss due to congestion, when the TCP-DCR modifications are
   applied.



                                   | <-------- tau ---------> |
                                    Cong Response Delay Timer
                                         Limited Transmit
                                     Additive Window Increase


            No Retransmission ------------+
            No Window Reduction           |
                                          |
            Set Cong Response ------+     |
               Delay Timer          |     |   Retransmission -+
                                    |     | Window Reduction  |
                                    |     |                   |
    | <-- Round Trip Time --> |     v     v                   v


    1  2  3  4  5  6          7     8  9  10  11        12    2
 ---,--,--,--,--,--,----------,-----,--,--,--,----------,-----,--,--,--
     \  \  \  \  \  \        / \   / \/ \/ \/ \        / \   / \/  /
      \  \  \  \  \  \      /   \ /  /\ /\ /\  \      /     /  /  /
       \  \  \  \  \  \    /     \  /  \  \  \  \    /     /  /  /  /
        \  \  \  \  \  \  /     / \/  / \/ \  \  \  /     /  /  /  /
         \  \  \  \  \  \/     /  /\ /  /\  \  \  \/     /  /  /  /
          \  \  \  \  \ /\    /  /  /  /  \  \  \ /\    /  /  /  /
Cong Drop --> X  \  \  \  \  /  /  / \/    \  \  \  \  /  /  /  /
            \     \  \/ \  \/  /  /  /\     \  \/ \  \/  /  /  /
             \     \ /\  \ /\ /  /  /  \     \ /\  \ /\ /  /  /
              \     \  \  \  \  /  /    \     \  \  \  \  /  /
               \   / \  \/ \/ \/  /      \   / \  \/ \/ \/  /
                \ /   \ /\ /\ /\ /        \ /   \ /\ /\ /\ /
 ----------------`-----`--`--`--`----------`-----`--`--`--`-----------
                 2     2  2  2  2          2     2  2  2  2


   Figure 2: Behavior of TCP-DCR in presence of packet loss due to congestion.


   The figure above shows the behavior of a TCP flow with the TCP-DCR
   modifications when a packet has been dropped due to congestion in the




Bhandarkar/Reddy         Expires February 2005                  [Page 5]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   network. In this case a cumulative acknowledgement is not received
   before the congestion delay timer expires. As a result, as soon as
   the congestion delay timer expires, the fast retransmit/recovery
   algorithm is triggered. The next section discusses the upper
   threshold on the delay tau so that this delay in congestion response
   does not adversely affect the throughput obtained by the flow using
   TCP-DCR modifications or the non TCP-DCR flows competing with it.


4.1. Choice of the delay duration (tau)


   The current implementations of TCP wait for three dupacks before
   treating them as an indication of packet loss due to congestion. The
   choice of waiting for three dupacks is heuristic. This document
   proposes that the delay before responding to congestion should be
   longer, so that underlying schemes have time to recover from non-
   congestion events. There is no optimal value for this delay such that
   all possible non-congestion events can be recovered. It is
   essentially a tradeoff between unnecessarily inferring congestion,
   and unnecessarily waiting for a long time before retransmitting a
   lost packet. Therefore, the choice of the delay is really choosing a
   place on the spectrum for the tradeoffs between these two concerns.
   This document aims to provide guidelines for reasonable bounds on the
   delay to make it useful, without adversely modifying the TCP
   behavior.


   Consider the case of wireless channel errors. The figure below shows
   a general scenario where the TCP sender is connected to the base
   station by a wired link and the TCP receiver is connected to a base
   station over a wireless link. The wired path between the base station
   and the sender TCP could consist of several hops, but would not
   affect the discussion here and so is shown as a single hop. The round
   trip time between the base station and wireless link is indicated by
   'rtt' and the end-to-end round trip time between the TCP sender and
   the TCP receiver is indicated by 'RTT'.


                                     +---------------+
                                     |      rtt      |
                                     |               |
                        wired        |    wireless   |
            TCP         link         V      link     |     TCP
           Sender  0-----------------0---------------0  Receiver
                   ^                Base             |
                   |              Station            |
                   |                                 |
                   |               RTT               |
                   +---------------------------------+


            Figure 3: General scenario for a wireless network.




Bhandarkar/Reddy         Expires February 2005                  [Page 6]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   In the above scenario, if we ignore ambient delays (e.g., inter-
   packet delay, queuing delay, etc.), a packet sent by the TCP sender
   at some time 't0' reaches the base station at 't0 + (RTT/2 - rtt/2)'
   and the receiver at time 't0 + RTT/2'. Suppose, a packet 'k' sent at
   time 't0' is lost on the wireless link due to channel errors. Then at
   't0 + RTT/2 + rtt/2' the base station receives an indication that the
   packet 'k' is lost. If it immediately retransmits the packet, then
   the packet 'k' is recovered at the receiver at time 't0 + RTT/2 +
   rtt'. The sender receives an acknowledgement for the packet 'k' at
   't0 + RTT/2 + rtt + RTT/2'. Hence the sender would have to delay the
   congestion response by at least 'rtt' time units, to allow the link
   layer to recover the packet. In practice, the inter-packet delays are
   non-zero and the TCP sender does not know the value of 'rtt'. Hence,
   a simple solution would be to set the lower bound on the delay in
   congestion response to one 'RTT'.


   The upper bound on the delay is imposed by the retransmission timer
   of TCP. The delay should be chosen such that the RTO timeout is
   avoided, because a timeout would be detrimental to the performance of
   protocol. The RTO is usually set to (RTT + 4 * RTTVAR). The standard
   recommends a minimum of 1 second, but many TCP implementations have a
   much smaller minimum, e.g., 100 ms. This forms the upper bound on the
   value for the congestion response delay tau.


   Based on the above discussion, this document recommends the value of
   tau to be set as one RTT. In the case of packet reordering, the
   amount by which the packet is reordered could be highly variable. The
   time to recover the lost packet is the time that the reordered packet
   takes to reach the receiver. Hence there is no preset lower bound for
   the delay tau, that will facilitate the recovery of a packet
   reordered by any amount. However, the upper bound is still decided by
   the discussion above. So, a value of one RTT for tau is still a
   reasonable choice. We conducted the analysis of the steady state
   bandwidth realized by TCP-DCR [BR03]. The results of the analysis
   show that the TCP-DCR modifications do not affect the steady state
   bandwidth.


   TCP-DCR does not increase the per-packet delivery time when there is
   no congestion in the network. However, when a packet is dropped, the
   choice of tau = one RTT may add upto one additional RTT of delay in
   recovering the lost packet. An important fact to remember here is
   that, the choice of tau does not cause the TCP-DCR sender to
   dramatically over-send packets because the protocol is still ACK-
   clocked. That is, a new packet is sent only upon the receipt of a
   dupack. If there is suddenly very high congestion in the network
   resulting in the drop of several packets, the TCP sender will have
   reduced its sending rate simply because not many dupacks are coming
   back.




Bhandarkar/Reddy         Expires February 2005                  [Page 7]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



4.2. Implementation Details


   The TCP-DCR modifications need to be applied only to the sender and
   the receiver remains unmodified. The sender can implement the delay
   in congestion response (tau) by using either a timer or by modifying
   the threshold on the number of duplicate acknowledgements to be
   received before triggering fast retransmit/recovery. The timer-based
   implementation is quite straight forward, but is influenced by the
   coarseness in the clock granularity. In the ack-based delay
   implementation, the sender could delay responding to congestion for
   the number of duplicate acknowledgements corresponding to the delay
   required. Thus, if 'tau' is chosen to be one RTT, the sender would
   wait for the receipt of 'W' duplicate acknowledgements before
   responding to congestion, where 'W' is the size of the congestion
   window when the packet loss is detected.


   The TCP-DCR modifications work with most flavors of the TCP protocol.
   However, this document advocates the use of TCP-DCR with TCP-SACK to
   ensure that the performance can be maintained high even under the
   conditions of multiple losses per round trip time. When used with
   TCP-SACK, the only thing modified by TCP-DCR is the time at which the
   fast retransmit/recovery algorithm is triggered in response to
   dupacks generated by the first loss within a window of packets. All
   subsequent losses within the same window (irrespective of whether
   they are congestion related or non-congestion events) are handled in
   exactly the same way as TCP-SACK would in the absence of TCP-DCR
   modifications. If the receiver is not SACK-capable, however, then the
   sender will have to use TCP-DCR with NewReno.


4.3. Receiver Buffer Requirement when TCP-DCR is used


   When TCP-DCR is used, the receiver will need to have additional
   buffer space to accommodate the extra packets corresponding to the
   delay 'tau', when a packet is lost due to congestion. Having these
   extra buffers allows TCP-DCR to achieve the best performance.
   However, if the buffers are not available, it does not degrade the
   performance, but the maximum performance improvement is not achieved.
   This is because, apart from congestion control, TCP also provides
   flow control such that a faster sender does not flood a slow
   receiver. The flow control is achieved by using a receiver advertised
   window, such that at any point the TCP sender may not send more
   packets than that allowed by 'min(cwnd,rwnd)' where 'cwnd' is the
   congestion window and 'rwnd' is the receiver advertised window. When
   the buffer space is not available, the receiver advertised window is
   small. As a result, during the delay 'tau' even though the limited
   transmit and congestion window allow a packet to be transmitted it
   will not be sent if the 'rwnd' (and hence the receiver buffer) does
   not allow it. However, the TCP sender can still delay the congestion




Bhandarkar/Reddy         Expires February 2005                  [Page 8]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   response by 'tau' allowing the local recovery mechanism to recover
   from non-congestion event.


4.4. Underlying mechanisms for recovering from non-congestion events


   The performance benefits to be gained from using the TCP-DCR
   modifications depends heavily on the existence of  an underlying
   scheme for recovering from the non-congestion events. In the case of
   packet reordering, no explicit scheme is required to recover the
   reordered packet; the reordered packet reaches the receiver after the
   delay that caused it to appear out-of-order. In the case of wireless
   networks, a packet corrupted due to channel errors might be recovered
   through link-level mechanisms such as link-level retransmissions or
   FEC (Forward Error Correction). If the corrupted packet is not
   recovered through link-level mechanisms, it will be interpreted by
   TCP as a packet lost due to congestion, and retransmitted by TCP.


5. Performance Evaluation


   This section of the document provides a glimpse of the performance
   improvements to be gained by the use of TCP-DCR modifications. The
   results presented here are only a small subset of the results
   presented in [BR03]. The results are based on simulations on the ns-2
   simulator [NS-2].


5.1. Network with packet reordering


   The table below shows the effect of delayed packets on the
   performance of TCP-SACK and the corresponding improvement in the
   performance in case of TCP-DCR. The experiment is conducted with a
   dumbell topology with the bottleneck link bandwidth  set to 8Mbps.
   The end-to-end RTT is set to 104ms. The receiver advertises a very
   large window such that the sending rate is not clamped by the
   receiver dynamics. There is no congestion in the network. The
   topology consists of a single flow. The packet delay is picked from a
   normal distribution with a mean of 25ms and a standard deviation of
   8ms. Thus, most packets chosen for delaying are delayed in the range
   0 to 50ms, simulating mild but persistent reordering. The throughput
   of TCP-SACK without the TCP-DCR modifications degrades drastically.
   However, when the TCP-DCR modifications are applied the performance
   is very good even when a large percentage of the packets are delayed.


    Percentage          Throughput of           Throughput of
    of Packets         TCP-SACK without         TCP-SACK with
     Delayed        TCP-DCR modifications     TCP-DCR modifications
       (%)                 (Mbps)                   (Mbps)
    ----------      ---------------------     ---------------------
       0.0                    7.325                  7.352




Bhandarkar/Reddy         Expires February 2005                  [Page 9]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



       1.0                    1.043                  7.339
       2.0                    0.795                  7.309
       5.0                    0.571                  7.185
       8.0                    0.498                  7.095
      10.0                    0.476                  7.061
      15.0                    0.440                  7.000
      20.0                    0.410                  7.008
      25.0                    0.409                  7.014
      30.0                    0.404                  7.006



5.2. Wireless Networks with Channel Errors


   The table below shows the effect of channel errors on the performance
   of TCP-SACK with and without the TCP-DCR modifications. The topology
   for the experiment consists of a sender connected via a wired link to
   a router which in turn is connected to the base station by a wired
   link. The bandwidth of the wired links is 100Mbps and the delay is
   5ms. The receiver is connected to the base station by a link
   simulating a satellite connection with a lower bandwidth and a larger
   delay. The bandwidth of this link is 1Mbps and the delay is 250ms.
   Packets are randomly chosen to be corrupted by channel errors. Link
   level retransmission is simulated by retransmitting the corrupted
   packet after a delay corresponding to the round trip time of the
   wireless link.


      Channel           Throughput of            Throughput of
      Error            TCP-SACK without          TCP-SACK with
      Rate          TCP-DCR modifications     TCP-DCR modifications
       (%)                 (Mbps)                   (Mbps)
    ----------      ---------------------      --------------------
       0.0                  0.962                    0.962
       0.5                  0.261                    0.957
       1.0                  0.186                    0.952
       2.0                  0.131                    0.943
       3.0                  0.107                    0.934
       4.0                  0.094                    0.925
       5.0                  0.086                    0.917
       6.0                  0.081                    0.908
       7.0                  0.078                    0.900
       8.0                  0.073                    0.892


5.3. Fairness Implications


   This section of the document addresses the fairness issues raised by
   delaying congestion response. The steady state analysis of TCP-DCR
   [BR03] shows that the throughput of the TCP-DCR protocol is similar
   to that of TCP [PFTK98]. Thus, the congestion control dynamics of




Bhandarkar/Reddy         Expires February 2005                 [Page 10]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   TCP-DCR are TCP-friendly. Essentially, TCP-DCR can be seen as a
   slowly-responsive TCP-friendly flow as explained in [BBFS01]. It has
   been shown in that paper that such flows are TCP-compatible.


   Simulation results agree with the discussion above. The following
   table shows the average throughput achieved by flows using TCP-SACK
   without the TCP-DCR modifications compared to flows using TCP-SACK
   with the TCP-DCR modifications in a congested network. The dumbell
   topology is used for this experiment with the bottleneck link
   capacity of 10Mbps being shared by 12 flows, half of which are TCP-
   SACK without TCP-DCR modifications and the other half are TCP-SACK
   with the TCP-DCR modifications. There are no non-congestion losses in
   the network and congestion is induced by modifying the buffers
   available at the bottleneck router. The throughput of each individual
   flow varies only slightly from the average throughput.


    Congestion         Avg. Throughput         Avg. Throughput
    Droprate          of TCP-SACK without     of TCP-SACK with
      (%)            TCP-DCR Modifications   TCP-DCR Modifications
                         (Mbps)                    (Mbps)
    -----------     ----------------------   ---------------------
       0.06                0.808                    0.795
       0.36                0.820                    0.782
       1.51                0.837                    0.765
       1.86                0.828                    0.774
       2.44                0.836                    0.767
       3.43                0.767                    0.835
       4.57                0.724                    0.874
       5.76                0.719                    0.788



5.4. Effect on Network Dynamics


   When the loss of a packet is indeed due to congestion, delaying the
   congestion response could make the protocol sluggish at relieving
   congestion in the network. However, when the delay is bounded by one
   RTT, the behavior of TCP-DCR is not significantly different from a
   TCP flow with high variance in RTT measurements. During the
   congestion response delay, the TCP-DCR flow appears like a flow whose
   RTT is twice the value when there is no congestion in the network.
   Performance evaluation through simulations has validated this view
   [BR03].


6. Implementation Issues


   The TCP-DCR modifications presented by this document are quite simple
   and do not require complicated changes. When the delay "tau" is
   implemented based on a timer, the timer value can be set to the




Bhandarkar/Reddy         Expires February 2005                 [Page 11]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   smoothed value of RTT (SRTT). However, when the delay "tau" is
   implemented by modifying the threshold on the number of dupacks to be
   received before responding, the RTT value being used is essentially
   the instantaneous value. The upper bound on the congestion response
   delay is established by the RTO estimate which is computed based on
   the smoothed RTT. This could potentially lead to a situation where
   the value of the congestion response delay is larger than the value
   of the RTO. Though such a situation could be fairly rare, even few
   unnecessary timeouts can degrade the performance drastically. So,
   this document recommends that the new threshold on the number of
   dupacks to wait before responding be scaled by the factor
   (SRTT)/(Current RTT Estimate).


   We have implemented the TCP-DCR modifications in the Linux 2.4.20
   kernel. The modifications require changes of only a few lines of
   code. Currently, we are in the process of evaluating the reordering
   robustness provided by native Linux implementations against that of
   TCP-DCR.


7. Incremental Deployment


   The TCP-DCR modifications proposed in this document lend themselves
   to incremental deployment. Only the TCP protocol on the sender side
   needs to be modified. The modifications themselves are minor and can
   be distributed easily as kernel patches. The use of TCP-DCR does not
   require the sender and receiver to negotiate any conditions during
   connection setup. Neither the receivers nor the routers need to be
   aware that the sender has been enhanced with the TCP-DCR
   modifications. Availability of additional buffers at the receiver
   will help maximize the benefits of using TCP-DCR but are not
   necessary.


8. Relationship to other work


   Over the past few years, several solutions have been proposed to
   improve the performance of TCP over wireless networks. These
   solutions fall in one of the following broad categories: split
   connection approaches [BB95,BS97,WT98,YB94], TCP-aware link layer
   protocols [BSAK95,CLM99], explicit loss notification approaches
   [BK98,KAPS02,RF99] and receiver-based approaches [SVSB99,VMPM02]. All
   the above mentioned schemes are proposed explicitly for improving the
   performance of TCP in wireless networks. While some of them could
   possibly be used in situations with other types of non-congestion
   events, the simplicity of TCP-DCR in our opinion, makes it a far more
   compelling solution for the problem.


   It has been shown that the performance of TCP over wireless networks
   can be improved by using other flavors of TCP. For example, by using




Bhandarkar/Reddy         Expires February 2005                 [Page 12]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   TCP-SACK [MMFR96] or TCP-westwood [MCGSW01] instead of standard
   implementations of TCP Reno, performance can be improved. The
   performance improvement by using TCP-SACK protocol however, is due to
   its ability to recover from multiple losses in one RTT and does not
   necessarily indicate robustness to non-congestion events. This
   document advocates the use of TCP-DCR modifications with the TCP-SACK
   flavor.


   Different solutions have been proposed in the literature to improve
   the performance of TCP when the network reorders packets
   persistently. In [BA02] the authors present several schemes which use
   DSACKs [FMMP00] (or could alternatively use timestamps [LM03] or
   other methods) to identify a false fast retransmit. In response, the
   sending rate is restored back to the level it was before the false
   fast retransmit. The reordering length for the packet is measured
   using the information available from DSACKs and the threshold on the
   number of dupacks to be received before responding (dupthresh) is
   increased to avoid future false fast retransmits. If a RTO timeout
   occurs, then it is presumed that the dupthresh has grown too large
   and it is reset to 3. In [ZKFP02] this process is further refined at
   the cost of maintaining significantly more state at the sender and
   using complicated algorithms for finding the optimal value for
   dupthresh such that costly RTO timeouts are avoided, while the
   performance is optimized to provide maximum reordering robustness.


   These solutions rely on some additional scheme for identifying
   reordering in the network (such as DSACKs or timestamps) and the
   perceived reordering information is collected from the network to set
   an optimal value for dupthresh. The Linux TCP provides an option of
   using either of these additional schemes or just the information from
   SACK to estimate the reordering length. The intent is to estimate the
   optimal amount of time to delay the triggering of fast
   retransmit/recovery algorithms to provide maximum reordering
   robustness, without resorting to RTO timeouts too often. By using
   TCP-DCR, this goal can be met without having to use complex state or
   algorithms for tuning the value of dupthresh. While TCP-DCR does not
   tune the dupthresh based on the perceived reordering in the network,
   when it is set to one RTT, it provides a simple and effective
   mechanism for providing reordering robustness without causing RTO
   timeouts. If the actual reordering within the network is less than
   one RTT, then no harm is done since no action is necessary when the
   packet is recovered. When the packet is reordered by more than one
   RTT, TCP-DCR does not wait for it it to be recovered, but in doing so
   avoids costly retransmission timeouts.


9. Security Considerations


   This proposal makes no changes to the underlying security of TCP.




Bhandarkar/Reddy         Expires February 2005                 [Page 13]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



10. Conclusions


   This document has proposed TCP-DCR modifications to TCP's congestion
   control mechanism to make it more robust to non-congestion events. We
   have explored this proposal though analysis and simulations, and are
   currently in the process of evaluating it through experiments on the
   Linux platform. We believe that TCP-DCR provides a simple, unified
   solution to improve the the robustness of TCP to non-congestion
   events, and that the solution is safe to deploy on the Internet. We
   would welcome additional analysis, simulations, and experimentation.


   We are bringing this proposal to the IETF to be considered as an
   Experimental RFC.


11. Acknowledgements


   We would like to thank Dr. Nitin Vaidya and Nauzad Sadry for their
   invaluable help with the wireless simulations. Comments from Sally
   Floyd have helped immensely in improving the quality of this
   document.


12. References


   [ABF01] M. Allman, H. Balakrishnan, and S. Floyd, "Enhancing TCP's
   Loss Recovery Using Limited Transmit," RFC 3042, Proposed Standard,
   January 2001.


   [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet
   Reordering," ACM Computer Communication Review, January 2002.


   [BB95] A. Bakre and B. R. Badrinath, "I-TCP: indirect TCP for mobile
   hosts,"  Proceedings of the 15th. International Conference on
   Distributed Computing Systems (ICDCS), May 1995.


   [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and Scott Shenker,
   "Dynamic Behavior of Slowly Responsive Congestion Control
   Algorithms," Proceedings of ACM SIGCOMM, Sep. 2001.


   [BK98] H. Balakrishnan and R. H. Katz, "Explicit Loss Notification
   and Wireless Web Performance," Proc. of IEEE GLOBECOM, Nov. 1998.


   [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering
   is not pat hological network behavior," IEEE/ACM Transactions on
   Networking, December 1999.


   [BPSK97] H. Balakrishnan, V. Padmanabhan, S. Seshan, and R. H. Katz,
   "A Comparison of Mechanisms for Improving TCP Performance over
   Wireless Links," IEEE/ACM Transactions on Networking, 1997.




Bhandarkar/Reddy         Expires February 2005                 [Page 14]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   [BR03] Sumitha Bhandarkar, and A. L. N. Reddy, "TCP-DCR: Making TCP
   Robust to Non-Congestion Losses," Technical Report TAMU-ECE-2003-04,
   July 2003.


   [BS97] K. Brown and S. Singh, "M-TCP: TCP for mobile cellular
   networks,"   ACM Computer Communications Review, vol. 27, no. 5,
   1997.


   [BSAK95] H. Balakrishnan, S. Seshan, E. Amir and R. Katz, "Improving
   TCP/IP performance over wireless networks," Proc. of ACM MOBICOM,
   Nov. 1995.


   [CLM99] H. M. Chaskar, T. V. Lakshman, and U. Madhow, "TCP Over
   Wireless with Link Level Error Control: Analysis and Design
   Methodology", IEEE Trans. on Networking, vol. 7, no. 5, Oct. 1999.


   [FMMP00] Sally Floyd, Jamshid Mahdavi, Matt Mathis and Matt Podolsky,
   "An Extension to the Selective Acknowledgement (SACK) Option for
   TCP," RFC 2883, July 2000.


   [JIDKT03] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D.
   Towsley, "Measur ement and Classification of Out-of-Sequence Packets
   in a Tier-1 IP Backbone," Pr oceedings of IEEE INFOCOM, 2003.


   [KAPS02] R. Krishnan, M. Allman, C. Partridge and J. P.G. Sterbenz,
   "Explicit Transport Error Notification for Error-Prone Wireless and
   Satellite Networks," BBN Technical Report No. 8333, BBN Technologies,
   February, 2002


   [KM02] I. Keslassy and N. McKeown, "Maintaining packet order in
   twostage switche s," Proceedings of the IEEE Infocom, June 2002


   [LM03] R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for
   TCP," RFC 3522, April 2003.


   [MCGSW01] S. Mascolo, C. Casetti, M. Gerla, M. Sanadidi and R. Wang,
   "TCP Westwood: Bandwidth Estimation for Enhanced Transport over
   Wireless Links," Proceedings of ACM MOBICOM, 2001.


   [MMFR] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP selective
   acknowledgment options," Internet RFC 2018.


   [NS-2] ns-2 Network Simulator. http://www.isi.edu/nsnam/


   [RF99] K. Ramakrishnan and S. Floyd, "A Proposal to add Explicit
   Congestion Notification (ECN) to IP," RFC 2481, January 1999.


   [SVSB99] P. Sinha, N. Venkitaraman, R. Sivakumar and V. Bhargavan,




Bhandarkar/Reddy         Expires February 2005                 [Page 15]


draft-ietf-tcpm-tcp-dcr-01                                   August 2004



   "WTCP: A Reliable Transport Protocol for Wireless Wide-Area
   Networks," Proceedings of ACM MOBICOM, August 1999.


   [VMPM02] N. H. Vaidya, M. Mehta, C. Perkins and G. Montenegro,
   "Delayed Duplicate Acknowledgement: a TCP-unaware Approach to Improve
   Performance of TCP over Wireless," Journal of Wireless Communications
   and Mobile Computing, special issue on Reliable Transport Protocols
   for Mobile Computing, February 2002.


   [WT98] K.-Y. Wang and S. K. Tripathi, "Mobile-end transport protocol:
   An alternative to TCP/IP over wireless links,"   IEEE INFOCOM'98,
   vol. 3, p. 1046, 1998.


   [YB94] R. Yavatkar and N. Bhagawat, "Improving End-to-End Performance
   of TCP over Mobile Internetworks,"   Workshop on Mobile Computing
   Systems and Applications, December 1994.


   [ZKFP02] M. Zhang, B. Karp, S. Floyd, and L. Peterson, "RR-TCP: A
   Reordering-Robust TCP with DSACK," ICSI Technical Report TR-02-006,
   Berkeley, CA, July 2002.


13. Author's Addresses


   Sumitha Bhandarkar
   Dept. of Elec. Engg.
   214 ZACH
   College Station, TX 77843-3128
   Phone: (512) 468-8078
   Email: sumitha@tamu.edu
   URL  : http://students.cs.tamu.edu/sumitha/


   A. L. Narasimha Reddy
   Associate Professor
   Dept. of Elec. Engg.
   315C WERC
   College Station, TX 77843-3128
   Phone : (979) 845-7598
   Email : reddy@ee.tamu.edu
   URL   : http://ee.tamu.edu/~reddy/













Bhandarkar/Reddy         Expires February 2005                 [Page 16]