Internet Engineering Task Force                               S. Dawkins
INTERNET DRAFT                                             G. Montenegro
                                                                 M. Kojo
                                                               V. Magret
                                                               N. Vaidya

                                                            June 9, 1999

        Performance Implications of Link-Layer Characteristics:
                           Links with Errors


Status of This Memo

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC2026.

   Comments should be submitted to the PILC mailing list at

   Distribution of this memo is unlimited.

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as ``work in

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at


   This document is part of the PILC (Performance Implications of
   Link-Layer Characteristics) series of recommendations for "extreme
   network conditions", and focuses on network paths that traverse
   "high error-rate" links.

Expires December 9, 1999                                        [Page 1]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   Because TCP is still the flagship protocol for reliable data
   transport on the Internet and is used for Hypertext Transfer
   Protocol (HTTP) in particular, and because TCP congestion avoidance
   procedures interact badly with high uncorrected error rates, this
   document is focused on TCP over high error rate links.

   The definition of "high error rate" isn't a formal one - the sender
   spends an excessive amount of time waiting on acknowledgements that
   aren't coming, whether due to data losses in the forward path or
   acknowledgement losses in the return path, and these losses are
   not due to congestion-related buffer exhaustion. The sender then
   transmits at substantially reduced traffic levels as it probes
   the network to determine "safe" traffic levels.

Expires December 9, 1999                                        [Page 2]

INTERNET DRAFT          PILC - Links with Errors               June 1999

Table of Contents

1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .   3
2.0 Interactions with Standard TCP Mechanisms  . . . . . . . . . . .   5
   2.1 Slow Start and Congestion Avoidance [RFC2581] . . . . . . . .   5
   2.2 Fast Retransmit and Fast Recovery [RFC2581] . . . . . . . . .   5
   2.3 Selective Acknowledgements [RFC2018]  . . . . . . . . . . . .   7
   2.4 Delayed Duplicate Acknowlegements [MV97, VMPM99]  . . . . . .   7
   2.5 Detecting Corruption Loss With Explicit Notifications . . . .   8
3.0 Summary of Recommendations . . . . . . . . . . . . . . . . . . .   9
4.0 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . .  10
5.0 References . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
Authors' addresses . . . . . . . . . . . . . . . . . . . . . . . . .  12

Expires December 9, 1999                                        [Page 3]

INTERNET DRAFT          PILC - Links with Errors               June 1999

1.0 Introduction

   It has been axiomatic that most losses on the Internet are due to
   congestion, as routers run out of buffers and discard incoming
   traffic. This observation is the basis for current TCP
   congestion avoidance strategies - if losses are due to congestion,
   there is no need for an explicit "congestion encountered"
   notification to the sender.

   Quoting Van Jacobson in 1988: "If packet loss is (almost) always
   due to congestion and if a timeout is (almost) always due to a
   lost packet, we have a good candidate for the `network is congested'
   signal." [VJ-DCAC]

   This axiom has served the Internet community well, because it
   allowed the deployment of TCPs that have allowed the Internet to
   accomodate explosive growth in link speeds and traffic levels.

   This same explosive growth has attracted users of networking
   technologies that DON'T have low uncorrected error rates -
   especially, but not only, some of the wireless Wide Area Network
   communities. Senders using these networks may not be able to
   transmit at anything like available bandwidth because their
   TCP connections are spending time in congestion avoidance
   procedures, or even slow-start procedures, that were triggered
   by corruption losses in the absence of congestion.

   This document makes recommendations about what the participants
   in connections that traverse high error-rate links may wish
   to consider doing to improve utilization of available bandwidth
   in ways that do not threaten the stability of the Internet.

   This document discusses end-to-end mechanisms that do not require
   TCP-level awareness by intermediate nodes. This places severe
   limitations on what the end nodes can know about the nature of
   losses that are occurring between the end nodes. Attempts to
   apply heuristics to distinguish between congestion and corruption
   losses have not been successful [BV97, BV98, BV98a]. A companion
   PILC recommendation, on Performance-Enhancing Proxies (PEPs),
   relaxes this restriction; because PEPs can be placed on boundaries
   where network characteristics change dramatically, PEPs have an
   additional opportunity to improve performance over links with
   uncorrected errors.

   Reducing the level of uncorrected errors would also improve
   utilization of available bandwidth, and a PILC recommendation
   for designers of future link-layer protocols discusses this
   issue in greater detail.

Expires December 9, 1999                                        [Page 4]

INTERNET DRAFT          PILC - Links with Errors               June 1999

2.0 Interactions with Standard TCP Mechanisms

   A TCP sender adapts its use of bandwidth based on feedback from
   the receiver. When TCP is not able to distinguish between losses
   due to congestion and losses due to uncorrected errors, it is
   not able to determine available bandwidth.

   Some TCP mechanisms, targeting recovery from losses due to
   congestion, coincidentally assist in recovery from losses due to
   uncorrected errors as well.

2.1 Slow Start and Congestion Avoidance [RFC2581]

   Slow Start and Congestion Avoidance [RFC2581] are essential
   the Internet's stability. They are based on implicit congestion
   notification, not explicit congestion notification. TCP connections
   with high error rates interact badly with Slow Start and with
   Congestion avoidance, because high error rates make the
   interpretation of losses ambiguous - the sender cannot know
   intuitively whether detected losses are due to congestion or to
   data corruption.

      - Whenever TCP's retransmission timer expires, the sender
        assumes that the network is congested and invokes slow start.

      - During slow start, the sender increases its window in
        units of segments. This is why it is important to use an
        appropriately sized MTU - and less reliable link layers
        often use smaller MTUs.

2.2 Fast Retransmit and Fast Recovery [RFC2581]

   TCPs deliver data as a reliable byte-stream to applications, so
   when a segment is lost (due to either congestion or corruption)
   delivery of data to the receiving application must wait until
   the missing data is received. Missing segments are detected by the
   receiver by segments arriving with out-of-order sequence numbers.

   TCPs are required to immediately acknowledge data when it is
   received out-of-order, sending the next expected sequence number
   with no delay, so that the sender can retransmit the required data
   and the receiver can resume delivery of data to the receiving
   application. These acknowledgements are called "duplicate ACKs",
   because they carry the same expected sequence number as an
   acknowledgement that has already been sent for the last in-order
   segment received (unless this acknowledgement was delayed for

Expires December 9, 1999                                        [Page 5]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   performance reasons).

   Because IP networks are allowed to reorder packets, the receiver may
   send duplicate acknowledgements for segments that are still enroute,
   but are arriving out of order due to routing changes, link-level
   retransmission, etc. When a TCP sender receives three duplicate
   ACKs, fast retransmit [RFC2581] allows it to infer that a segment
   was lost. The sender retransmits what it considers to be this lost
   segment without waiting for the full timeout, thus saving time.

   After a fast retransmit, a sender invokes the fast recovery
   [RFC2581] algorithm, whereby it invokes congestion avoidance,
   but not slow start.  This also saves time.

   In general, TCP can increase its window beyond the
   delay-bandwidth product. In links with high error rates, the
   TCP window may remain rather small, less than four segments,
   for long periods of time due to any of the following reasons:

      1. Typical "file size" to be transferred over a connection
     is relatively small (Web requests, Web document objects,
     email messages, files, etc.) In particular, users of
     links with high error rates are often unwilling to
     carry out large transfers as the response time is so long.

      2. When links have high uncorrected error rates, the cwnd
     tends to stay small.

      3. When a TCP path with high uncorrected error rates
     "crosses" a highly congested wireline Internet path,
     congestion losses on the Internet have the same effect as 2.

      4. Commonly, ISPs/operators configure only a small number
     of buffers (even as few as for 3 packets) per user in
     their dial-up routers

      5. Often small socket buffers are recommended with high
     error-rate links in order to prevent the RTO from inflating.

   A small window - especially a window of less than four segments -
   effectively prevents the sender from taking advantage of Fast
   Retransmits. Moreover, efficient recovery from multiple losses
   within a single window requires adoption of new proposals
   (NewReno [RFC2582]).

   Recommendation: Implement Fast Retransmit and Fast Recovery at
   this time. This is a widely-implemented optimization and is
   currently at Proposed Standard level. [RFC2488] recommends

Expires December 9, 1999                                        [Page 6]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   implementation of Fast Retransmit/Fast Recovery in satellite
   environments.  NewReno [RFC2582] apparently does help a sender
   better handle partial ACKs and multiple losses in a single
   window, but at this point is not recommended due to its
   experimental nature.  Instead, SACK is the preferred mechanism.

2.3 Selective Acknowledgements [RFC2018]

   Selective Acknowledgements allow the repair of multiple segment
   losses per window without requiring one round-trip per loss.

   Selective acknowledgements are most useful in LFNs ("Long Fat
   Networks", because of the long round trip times that may be
   encountered in these environments, according to Section 1.1 of
   [RFC1323], and are especially useful if large windows are required,
   because there is a considerable probability of multiple segment
   losses per window.

   In low-speed, high error-rate environments (for example, the
   wireless WAN environment), TCP windows are much smaller, and burst
   errors must be much longer in duration in order to damage multiple
   segments. Accordingly, the complexity of SACK may not be
   justifiable, unless there is a high probability of both burst
   errors and congestion.

   Berkeley's SNOOP protocol research [SNOOP] indicates that SACK
   does improve throughput for SNOOP when multiple segments are lost
   per window [BPSK96]. SACK allows SNOOP to recover from
   multi-segment losses in one round-trip. In this case, the wireless
   device needs to implement some form of selective
   acknowledgements.  If SACK is not used, recovery from
   multi-segment losses takes so long that TCP enters congestion
   avoidance anyway.

   Recommendation: Implement SACK now for compatibility with other
   TCPs and improved performance with SNOOP.

2.4 Delayed Duplicate Acknowlegements [MV97, VMPM99]

   When link layers try aggressively to correct a high underlying
   error rate, it is imperative to prevent interaction between
   link-layer retransmission and TCP retransmission as these layers
   duplicate each other's efforts. In such an environment it may
   make sense to delay TCP's efforts so as to give the link-layer a
   chance to recover. With this in mind, the Delayed Dupacks [MV97,
   VMPM99] scheme selectively delays duplicate acknowledgements
   at the receiver.  It is preferrable to allow a local mechanism
   to resolve a local problem, instead of invoking TCP's end-to-end

Expires December 9, 1999                                        [Page 7]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   mechanism and incurring the associated costs, both in terms of
   wasted bandwidth and in terms of its effect on TCP's window

   At this time, it is not well understood how long the receiver
   should delay the duplicate acknowledgments. In particular, the
   impact of medium access control (MAC) protocol on the
   choice of delay parameter needs to be studied. The MAC
   protocol may affect the ability to choose the appropriate
   delay (either statically or dynamically). In general,
   significant variabilities in link-level retransmission times
   can have an adverse impact on the performance of the Delayed
   Dupacks scheme.

   Recommendation: Delaying duplicate acknowledgements may be
   useful in specific network topologies, but a general
   recommendation requires further research and experience.

2.5 Detecting Corruption Loss With Explicit Notifications

   As noted above, today's TCPs assume that any loss is due
   to congestion, and encounter difficulty in distinguishing
   between congestion loss and corruption loss because this
   "implicit notification" mechanism can't carry both meanings
   at once.

   With explicit notification from the network it is possible to
   determine when a loss is due to congestion. Several proposals
   along these lines include:

     - Explicit Loss Notification (ELN) [BPSK96]

     - Explicit Bad State Notification (EBSN) [BBKVP96]

     - Explicit Loss Notification to the Receiver (ELNR), and
     Explicit Delayed Dupack Activation Notification (EDDAN)      [MV97]

     - Explicit Congestion Notification (ECN) [ECN]

   Of these proposals, Explicit Congestion Notification (ECN)
   seems closest to deployment on the Internet.

   ECN requires changes to the routing infrastructure to perform
   "active queue management" - to detect impending buffer
   exhaustion, and to randomly drop packets when impending
   buffer exhaustion has been detected, so that receivers will
   respond to this implicit notification by slowing their
   transmission rate and avoiding total buffer exhaustion.

Expires December 9, 1999                                        [Page 8]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   ECN then builds on "active queue management" by providing
   a mechanism for hosts marking packets as "ECN-capable",
   and routers marking ECN-capable packets as "congestion
   encountered" during periods of impending buffer exhaustion.
   This allows ECN-capable routers to provide congestion
   notification to ECN-capable hosts without dropping packets
   that would otherwise have been delivered (because the
   router still has available buffers when the packet arrives).

   The problem with ECN is that the absence of packets marked as
   "congestion encountered" should not be interpreted by ECN-capable
   TCP connections as a green light for aggressive
   retransmissions. On the contrary, during periods of extreme
   network congestion routers may drop packets marked with explicit
   notification because their buffers are exhausted - exactly the
   wrong time for a host to begin retransmitting aggressively.

   This isn't a criticism of ECN, which was never intended to be
   used as a surrogate for explicit corruption notification - only
   an explanation of why it isn't such a surrogate.

   ECN uses the TOS byte in the IP header to carry congestion
   information (ECN-Capable and Congestion-Encountered).  This byte
   is not encrypted in IPSEC, so ECN can be used on TCP connections
   that are encrypted using IPSEC.

   Recommendation: Implement ECN, but do not (mis)use it as a
   surrogate for explicit corruption notification.

   Continue to investigate true corruption-notification mechanisms
   like ELNR and EDDAN [MV97], in which the only systems that need
   to be modified are the base station and the mobile device justify
   further research.  However, the requirement that the base station
   be able to examine the TCP headers flying through it raises issues
   with respect to IPSEC-encrypted packets.

3.0 Summary of Recommendations

   Because existing TCPs have only one implicit loss feedback
   mechanism, it is not possible to use this mechanism to
   distinguish between congestion loss and corruption loss
   without additional information. Because congestion affects
   all traffic on a path while corruption affects only the
   specific traffic encountering uncorrected corruption,
   avoiding congestion has to take precedence over quickly
   repairing corruption loss. This means that the best that
   can be achieved without new feedback mechanisms is minimizing

Expires December 9, 1999                                        [Page 9]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   the amount of time spent unnecessarily in congestion avoidance.

   Fast Retransmit/Fast Recovery allows quick repair of loss
   without giving up the safety of congestion avoidance. In order
   for Fast Retransmit/Fast Recovery to work, the window size must
   be large enough to force the receiver to send three duplicate
   acknowledgements before the retransmission timeout interval
   expires, forcing full TCP slow-start.

   Selective Acknowledgements (SACK) extend the benefit of Fast
   Retransmit/Fast Recovery to situations where multiple "holes"
   in the window need to be repaired more quickly than can be
   accomplished by executing Fast Retransmit for each hole, only
   to discover the next hole. SACK has been found particularly
   useful in SNOOP environments [SNOOP](where an intermediate
   network node is handling retransmissions on behalf of the

   SNOOP will be described in more detail in the PILC PEP draft,
   and is only mentioned here in conjunction with SACK.

   Delayed Duplicate Acknowledgements is an attractive scheme,
   especially when link layers use fixed retransmission timer
   mechanisms that may still be trying to recover when TCP-level
   retransmission timeouts occur, adding additional traffic to
   the network. This proposal is worthy of additional study,
   but is not recommended at this time, because we don't know
   how to calculate optimal amounts of delay for an arbitrary
   network topology.

   Explicit corruption notification mechanisms are being
   overshadowed by explicit congestion notification mechanisms,
   and it's not possible to use explicit congestion notification
   as a surrogate for explicit corruption notification.

   Of these mechanisms, SNOOP plus SACK and Delayed Duplicate
   Acknowledgements apply only to wireless networks. The
   others cover both wireless and wireline environments. Their
   more general applicability attracts more attention and analysis
   from the research community.

   Of these mechanisms, only "SNOOP plus SACK" ceases working
   in the presence of IPSec.

4.0 Acknowledgements

   This recommendation has grown out of the Internet Draft "TCP Over
   Long Thin Networks", which was in turn based on work done in the

Expires December 9, 1999                                       [Page 10]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   IETF TCPSAT working group.

5.0 References

   [BBKVP96] Bakshi, B., P., Krishna, N., Vaidya, N., Pradhan, D.K.,
   "Improving Performance of TCP over Wireless Networks," Technical
   Report 96-014, Texas A&M University, 1996.

   [BPSK96] Balakrishnan, H., Padmanabhan, V., Seshan, S., Katz, R.,
   "A Comparison of Mechanisms for Improving TCP Performance over
   Wireless Links," in ACM SIGCOMM, Stanford, California, August

   [BV97] Biaz, S., Vaidya, N., "Using End-to-end Statistics to
   Distinguish Congestion and Corruption Lossses: A Negative Result,"
   Texas A&M University, Technical Report 97-009, August 18, 1997.

   [BV98] Biaz, S., Vaidya, N., "Sender-Based heuristics for
   Distinguishing Congestion Losses from Wireless Transmission
   Losses," Texas A&M University, Technical Report 98-013, June

   [BV98a] Biaz, S., Vaidya, N., "Discriminating Congestion Losses
   from Wireless Losses using Inter-Arrival Times at the Receiver,"
   Texas A&M University, Technical Report 98-014, June 1998.

   [ECN] Ramakrishnan, K.K., Floyd, S., "A Proposal to add Explicit
   Congestion Notification (ECN) to IP", RFC 2481, January 1999.

   [MV97] Mehta, M., Vaidya, N., "Delayed
   Duplicate-Acknowledgements:  A Proposal to Improve Performance of
   TCP on Wireless Links," Texas A&M University, December 24, 1997.
   Available at

   [RFC1122] Braden, R., Requirements for Internet Hosts --
   Communication Layers, October 1989.

   [RFC1323] Van Jacobson, Robert Braden, and David Borman. TCP
   Extensions for High Performance, May 1992. RFC 1323.

   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A.,
   "TCP Selective Acknowledgment Options," October, 1996.

   [RFC2309] Braden, B. Clark, D., Crowcroft, J., Davie, B., Deering,
   S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge,
   C., Peterson, L., Ramakrishnan, K.K., Shenker, S., Wroclawski, J.,

Expires December 9, 1999                                       [Page 11]

INTERNET DRAFT          PILC - Links with Errors               June 1999

   Zhang, L., "Recommendations on Queue Management and Congestion
   Avoidance in the Internet," RFC 2309, April 1998.

   [RFC2488] Mark Allman, Dan Glover, Luis Sanchez. "Enhancing TCP
   Over Satellite Channels using Standard Mechanisms," RFC 2488
   (BCP 28), January 1999.

   [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion
   Control," April 1999. RFC 2581.

   [RFC2582] Floyd, S., Henderson, T., "The NewReno Modification to
   TCP's Fast Recovery Algorithm," April 1999. RFC 2582.

   [SNOOP] Balakrishnan, H., Seshan, S., Amir, E., Katz, R.,
   "Improving TCP/IP Performance over Wireless Networks," Proc. 1st
   ACM Conf. on Mobile Computing and Networking (Mobicom), Berkeley,
   CA, November 1995.

   [VJ-DCAC] Van Jacobson, "Dynamic Congestion Avoidance / Control"
   e-mail dated Feberuary 11, 1988, available from

   [VMPM99] N. H. Vaidya, M. Mehta, C. Perkins, G. Montenegro,
   "Delayed Duplicate Acknowledgements: A TCP-Unaware Approach to
   Improve Performance of TCP over Wireless," Technical Report
   99-003, Computer Science Dept., Texas A&M University, February

Authors' addresses

   Questions about this document may be directed at:

          Spencer Dawkins
          Nortel Networks
          P.O. Box 833805
          Richardson, Texas 75083-3805

          Voice:    +1-972-684-4827
          Fax:      +1-972-685-3292

Expires December 9, 1999                                       [Page 12]

INTERNET DRAFT          PILC - Links with Errors               June 1999

          Gabriel E. Montenegro
          Sun Labs Networking and Security Group
          Sun Microsystems, Inc.
          901 San Antonio Road
          Mailstop UMPK 15-214
          Mountain View, California 94303

          Voice:    +1-650-786-6288
          Fax:      +1-650-786-6445

          Markku Kojo
          University of Helsinki/Department of Computer Science
          P.O. Box 26 (Teollisuuskatu 23)
          FIN-00014 HELSINKI

          Voice:  +358-9-7084-4179
          Fax:    +358-9-7084-4441

          Vincent Magret
          Corporate Research Center
          Alcatel Network Systems, Inc
          1201 Campbell
          Mail stop 446-310
          Richardson Texas 75081 USA
          M/S 446-310

          Voice:    +1-972-996-2625
          Fax:    +1-972-996-5902

          Nitin Vaidya
          Dept. of Computer Science
          Texas A&M University
          College Station, TX 77843-3112
          Voice:    +1 409-845-0512
          Fax:      +1 409-847-8578

Expires December 9, 1999                                       [Page 13]