Internet Engineering Task Force                               S. Dawkins
INTERNET DRAFT                                             G. Montenegro
                                                                 M. Kojo
                                                               V. Magret
                                                               N. Vaidya

                                                      September 22, 2000

        End-to-end Performance Implications of Links with Errors

                      draft-ietf-pilc-error-05.txt

Status of This Memo

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC2026.

   Comments should be submitted to the PILC mailing list at
   pilc@grc.nasa.gov.

   Distribution of this memo is unlimited.

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as ``work in
   progress.''

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract

   The rapidly-growing Internet is being accessed by an
   increasingly wide range of devices over an increasingly wide
   variety of links. At least some of these links do not provide
   the reliability that hosts expect, and this expansion into
   unreliable links causes some Internet protocols, especially TCP
   [RFC793], to perform poorly.



Expires March 22, 2001                                          [Page 1]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   Specifically, TCP congestion control [RFC2581], while
   appropriate for connections that lose traffic primarily
   because of congestion and buffer exhaustion, interact badly
   with connections that traverse links with high uncorrected
   error rates. The result is that senders may spend an excessive
   amount of time waiting on acknowledgements that aren't coming,
   whether these losses are due to data losses in the forward
   path or acknowledgement losses in the return path, and then,
   although these losses are not due to congestion-related buffer
   exhaustion, the sending TCP transmits at substantially reduced
   traffic levels as it probes the network to determine "safe"
   traffic levels.

   This document discusses the specific TCP mechanisms that are
   problematic in these environments, and discusses what can be done
   to mitigate the problems without introducing intermediate devices
   into the connection.

   This document does not address issues with other transport
   protocols, for example, UDP.































Expires March 22, 2001                                          [Page 2]


INTERNET DRAFT          PILC - Links with Errors          September 2000


Table of Contents

1.0 Introduction ..................................................    4
   1.1 Relationship of this recommendation and [PILC-PEP] .........    4
   1.2 Relationship of this recommendation and [PILC-LINK] ........    5
   1.3 Should you be reading this recommendation?  ................    5
2.0 Errors and Interactions with TCP Mechanisms ...................    6
   2.1 Slow Start and Congestion Avoidance [RFC2581] ..............    6
   2.2 Fast Retransmit and Fast Recovery [RFC2581] ................    7
   2.3 Selective Acknowledgements [RFC2018, SACK-EXT] .............    9
3.0 Summary of Recommendations ....................................   10
4.0 Topics For Further Work .......................................   10
   4.1 Achieving, and maintaining, large windows ..................   11
5.0 Acknowledgements ..............................................   12
Changes ...........................................................   12
References ........................................................   13
Authors' addresses ................................................   16
Appendix A: When TCP Defers Recovery to the Link Layer ............   18
Appendix B: Detecting Transmission Errors With Explicit Notifi-
cations ...........................................................   18
Appendix C Appropriate Byte Counting [ALL99] (Experimental) .......   20






























Expires March 22, 2001                                          [Page 3]


INTERNET DRAFT          PILC - Links with Errors          September 2000


1.0 Introduction

   It has been axiomatic that most losses on the Internet are due to
   congestion, as routers run out of buffers and discard incoming
   traffic. This observation is the basis for current TCP
   congestion avoidance strategies - if losses are due to congestion,
   there is no need for an explicit "congestion encountered"
   notification to the sender.

   Quoting Van Jacobson in 1988: "If packet loss is (almost) always
   due to congestion and if a timeout is (almost) always due to a
   lost packet, we have a good candidate for the `network is congested'
   signal." [VJ-DCAC]

   This axiom has served the Internet community well, because it
   allowed the deployment of TCPs that have allowed the Internet to
   accomodate explosive growth in link speeds and traffic levels.

   This same explosive growth has attracted users of networking
   technologies that DON'T have low uncorrected error rates -
   including many satellite-connected users, and many wireless Wide
   Area Network-connected users. Users connected to these networks may
   not be able to transmit and receive at anything like available
   bandwidth because their TCP connections are spending time in
   congestion avoidance procedures, or even slow-start procedures, that
   were triggered by transmission error in the absence of congestion.

   This document makes recommendations about what the participants
   in connections that traverse high error-rate links may wish
   to consider doing to improve utilization of available bandwidth
   in ways that do not threaten the stability of the Internet.

   Applications use TCP in very different ways, and these have
   interactions with TCP's behavior [HPF-CWV]. Nevertheless,
   it is possible to make some basic assumptions about TCP
   flows. Accordingly, the mechanisms discussed here are applicable
   for all uses of TCP, albeit in varying degrees according to
   different scenarios (as noted where appropriate).

   This document does not address issues with non-TCP transport
   protocols, for example, UDP.


1.1 Relationship of this recommendation and [PILC-PEP]

   This document discusses end-to-end mechanisms that do not require
   TCP-level awareness by intermediate nodes. This places severe
   limitations on what the end nodes can know about the nature of



Expires March 22, 2001                                          [Page 4]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   losses that are occurring between the end nodes. Attempts to
   apply heuristics to distinguish between congestion and transmission
   error have not been successful [BV97, BV98, BV98a]. A companion
   PILC document on Performance-Enhancing Proxies, [PILC-PEP],
   relaxes this restriction; because PEPs can be placed on boundaries
   where network characteristics change dramatically, PEPs have an
   additional opportunity to improve performance over links with
   uncorrected errors.

   However, generalized use of PEPs contravenes the end-to-end
   principle and is highly undesireable given their deleterious
   implications with respect to the following [PILC-PEP]: fate
   sharing (a PEP adds a third point of failure besides the
   endpoints themselves), end-to-end reliability and diagnostics,
   security (particularly, network layer security such as IPsec),
   mobility (handoffs are much more complex because state must
   be transferred), asymmetric routing (PEPs typically require
   being on both the forward and reverse paths of a connection),
   scalability (PEPs add more state to maintain), QoS transparency
   and guarantees, etc

   Not every type of PEP has all the drawbacks listed
   above. Nevertheless, the use of PEPs may have very serious
   consequences which must be weighed carefully.


1.2 Relationship of this recommendation and [PILC-LINK]

   This recommendation is for use with TCP over subnetwork technologies
   that have already been deployed. A companion PILC recommendation,
   [PILC-LINK], is for designers of subnetworks that are intended to
   carry Internet protocols, and have not been completely specified,
   so that the designers have the opportunity to reduce the number of
   uncorrected errors TCP will encounter.

1.3 Should you be reading this recommendation?

   All known subnetwork technologies provide an "imperfect"
   subnetwork service - the bit error rate is non-zero. But there's
   no obvious way for end stations to tell the difference between
   losses due to congestion and losses due to transmission errors.

   It may be obvious if a directly-attached subnetwork reports
   transmission errors. But both hosts won't be directly attached
   to the same subnetwork in all but the most trivial networks, so
   even if one host receives specific error reports, the other host
   probably won't.




Expires March 22, 2001                                          [Page 5]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   Another way of deciding if a subnetwork should be considered to
   have a "high error rate" is by appealing to mathematics.

   A formula giving an upper bound on the performance of any
   additive- increase, multiplicative-decrease algorithm likely to
   be implemented in TCP in the future was derived in [MSMO97]:

                     MSS   1
           BW = 0.93 --- -------
                     RTT sqrt(p)

   where
           MSS  is the segment size being used by the connection
           RTT  is the end-to-end round trip time of the TCP connection
           p    is the packet loss rate for the path
                (i.e. .01 if there is 1% packet loss)

   If one plugs in an observed packet loss rate and then does the
   math and sees predicted bandwidth utilization that is greater
   than the link speed, the connection won't benefit from
   recommendations in ERROR, because the level of packet losses
   being encountered won't affect the ability of TCP to utilize the
   link. If, however, the predicted bandwidth is less than the link
   speed, packet losses are affecting the ability of TCP to utilize
   the link, and if further investigation reveals a subnetwork with
   significant transmission error rates, the recommendations in
   ERROR will improve the ability of TCP to utilize the link.


2.0 Errors and Interactions with TCP Mechanisms

   A TCP sender adapts its use of bandwidth based on feedback from
   the receiver. When TCP is not able to distinguish between losses
   due to congestion and losses due to uncorrected errors, it is
   not able to accurately determine available bandwidth.

   Some TCP mechanisms, targeting recovery from losses due to
   congestion, coincidentally assist in recovery from losses due to
   uncorrected errors as well.

2.1 Slow Start and Congestion Avoidance [RFC2581]

   Slow Start and Congestion Avoidance [RFC2581] are essential to
   the Internet's stability. These mechanisms were designed to
   accommodate networks that didn't provide explicit congestion
   notification. Although experimental mechanisms like [RFC2481]
   are moving in the direction of explicit notification, the effect
   of ECN on ECN-aware TCPs is essentially the same as the effect



Expires March 22, 2001                                          [Page 6]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   of implicit congestion notification through congestion-related
   loss.

   TCP connections experiencing high error rates interact badly
   with Slow Start and with Congestion Avoidance, because high
   error rates make the interpretation of losses ambiguous -
   the sender cannot know intuitively whether detected losses are
   due to congestion or to data corruption. TCP makes the "safe"
   choice - assume that the losses are due to congestion.

      - Whenever TCP's retransmission timer expires, the sender
        assumes that the network is congested and invokes slow start.

      - Less-reliable link layers often use small link MTUs. This slows
        the rate of increase in the sender's window size during slow
        start, because the sender's window is increased in units of
        segments. Small link MTUs alone don't improve things unless
        Path MTU discovery is also used to prevent fragmentation.
        Path MTU discovery allows the most rapid opening
        of the sender's window size during slow start, but a number of
        round trips may still be required to open the window completely.

   Recommendation: Slow Start and Congestion Avoidance are MUSTs in
   [RFC1122], itself a full Internet Standard. Recommendations in this
   document will not interfere with these mechanisms.

2.2 Fast Retransmit and Fast Recovery [RFC2581]

   TCPs deliver data as a reliable byte-stream to applications, so
   when a segment is lost (whether due to either congestion or
   transmission loss), delivery of data to the receiving application
   must wait until the missing data is received. Missing segments are
   detected by the receiver by segments arriving with out-of-order
   sequence numbers.

   TCPs SHOULD immediately send an acknowledgement when data is
   received out-of-order [RFC2581], sending the next expected
   sequence number with no delay, so that the sender can retransmit
   the required data and the receiver can resume delivery of data
   to the receiving application. When an acknowledgement carries
   the same expected sequence number as an acknowledgement that
   has already been sent for the last in-order segment received,
   these acknowledgements are called "duplicate ACKs".

   Because IP networks are allowed to reorder packets, the receiver
   may send duplicate acknowledgements for segments that arrive
   out of order due to routing changes, link-level retransmission,
   etc. When a TCP sender receives three duplicate ACKs, fast



Expires March 22, 2001                                          [Page 7]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   retransmit [RFC2581] allows it to infer that a segment was
   lost. The sender retransmits what it considers to be this lost
   segment without waiting for the full retransmission timeout,
   thus saving time.

   After a fast retransmit, a sender halves its congestion window
   and invokes the fast recovery [RFC2581] algorithm, whereby
   it invokes congestion avoidance, but not slow start from a
   one-segment congestion window. This also saves time.

   It's important to be realistic about the maximum throughput that
   TCP can have over a connection that traverses a high error-rate
   link. Even using Fast Retransmit/Fast Recovery, the sender will
   halve the congestion window each time a window contains one or
   more segments that is lost, and will re-open the window by one
   additional segment for each acknowledgement that is received. If
   a connection path traverses a link that loses one or more segments
   during recovery, the one-half reduction takes place again, this time
   on a reduced congestion window - and this downward spiral will
   continue until the connection is able to recover completely without
   experiencing loss.

   In general, TCP can increase its congestion window beyond the
   delay-bandwidth product. In links with high error rates, the
   TCP window may remain rather small for long periods of time
   due to any of the following reasons:

      1. TCP's congestion avoidance strategy is additive-increase,
         multiplicative-decrease, which means that if additional
         errors are encountered before the congestion window
         recovers completely from a 50-percent reduction, the
         effect can be a "downward spiral" of the congestion window
         due to additional 50-percent reductions. This "downward
         spiral" will hold the congestion window below the capacity
         of the path between the endpoints until the error rate
         decreases, allowing full recovery by additive increase. Of
         course, no downward spiral occurs if the error rate is
         constantly high and the congestion window always remains
         small.

      2. If a network path with high uncorrected error rates DOES
         cross a highly congested wireline Internet path,
         congestion losses on the Internet have the same effect as
         losses due to corruption.

   Not all causes of small windows are related to errors. For
   example, HTTP/1.0 commonly closes TCP connections to indicate
   boundaries between requested resources. This means that these



Expires March 22, 2001                                          [Page 8]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   applications are constantly closing "trained" TCP connections
   and opening "untrained" TCP connections which will execute slow
   start, beginning with one or two segments. This can happen even
   with HTTP/1.1, if webmasters configure their HTTP/1.1 servers to
   close connections instead of waiting to see if the connection will
   be useful again.

   A small window - especially a window of less than four segments -
   effectively prevents the sender from taking advantage of Fast
   Retransmits. Moreover, efficient recovery from multiple losses
   within a single window requires adoption of new proposals
   (NewReno [RFC2582]).

   Recommendation: Implement Fast Retransmit and Fast Recovery at
   this time. This is a widely-implemented optimization and is
   currently at Proposed Standard level. [RFC2488] recommends
   implementation of Fast Retransmit/Fast Recovery in satellite
   environments.  In cases where SACK (see next section) can not be
   enabled for both sides of a connection, NewReno [RFC2582] may be
   used by TCP senders to better handle partial ACKs and multiple
   losses in a single window.

2.3 Selective Acknowledgements [RFC2018, SACK-EXT]

   Selective Acknowledgements allow the repair of multiple segment
   losses per window without requiring one (or more) round-trips
   per loss.

   [SACK-EXT] proposes an extension to SACK that allows receivers
   to provide more information about the order of delivery of
   segments, allowing "more robust operation in an environment of
   reordered packets, ACK loss, packet replication, and/or early
   retransmit timeouts". [SACK-EXT] has been approved for proposed
   standard as a minor but useful update to Selective
   Acknowledgements.  Unless explicitly stated otherwise, in this
   document "Selective Acknowledgements" (or "SACK") refers to the
   combination of [RFC2018] and [SACK-EXT].

   Selective acknowledgements are most useful in LFNs ("Long Fat
   Networks"), because of the long round trip times that may be
   encountered in these environments, according to Section 1.1 of
   [RFC1323], and are especially useful if large windows are
   required, because there is a higher probability of multiple
   segment losses per window.

   On the other hand, if error rates are generally low but
   occasionally increase due to interference, TCP will have the
   opportunity to increase its window to larger values.  When



Expires March 22, 2001                                          [Page 9]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   interference occurs, multiple losses within a window are likely
   to occur.  In this case, SACK would provide benefits in speeding
   the recovery and preventing unnecessary extra reduction of
   window size.

   Recommendation: SACK as specified in [RFC2018] and updated by
   [SACK-EXT] is a Proposed Standard. Implement SACK now for
   compatibility with other TCPs.

3.0 Summary of Recommendations

   Because existing TCPs have only one implicit loss feedback
   mechanism, it is not possible to use this mechanism to
   distinguish between congestion loss and transmission error
   without additional information. Because congestion affects all
   traffic on a path while transmission loss affects only the
   specific traffic encountering uncorrected errors, avoiding
   congestion has to take precedence over quickly repairing
   transmission error. This means that the best that can be
   achieved without new feedback mechanisms is minimizing the
   amount of time spent unnecessarily in congestion avoidance.

   Fast Retransmit/Fast Recovery allows quick repair of loss
   without giving up the safety of congestion avoidance. In order
   for Fast Retransmit/Fast Recovery to work, the window size must
   be large enough to force the receiver to send three duplicate
   acknowledgements before the retransmission timeout interval
   expires, forcing full TCP slow-start.

   Selective Acknowledgements (SACK) extend the benefit of Fast
   Retransmit/Fast Recovery to situations where multiple segment
   losses in the window need to be repaired more quickly than can
   be accomplished by executing Fast Retransmit for each segment
   loss, only to discover the next segment loss.

   These mechanisms cover both wireless and wireline environments.
   This general applicability attracts more attention and analysis
   from the research community.

   All of these mechanisms continue to work in the presence
   of IPsec.

4.0 Topics For Further Work

   Delayed Duplicate Acknowledgements is an attractive scheme,
   especially when link layers use fixed retransmission timer
   mechanisms that may still be trying to recover when TCP-level
   retransmission timeouts occur, adding additional traffic to the



Expires March 22, 2001                                         [Page 10]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   network. This proposal is worthy of additional study, but is not
   recommended at this time, because we don't know how to calculate
   appropriate amounts of delay for an arbitrary network topology.

   It is not possible to use explicit congestion notification
   as a surrogate for explicit transmission error notification
   (no matter how much we wish it was!). Some mechanism to
   provide explicit notification of transmission error would
   be very helpful. This might be more easily provided in a
   PEP environment, especially when the PEP is the "first hop"
   in a connection path, because current checksum mechanisms
   do not distinguish between transmission error to a payload
   and transmission error to the header - and, if the header is
   damaged it's problematic to send explicit transmission error
   notification to the right endpoints.

   Losses that take place on the ACK stream, especially while a TCP
   is learning network characteristics, can make the data stream
   quite bursty (resulting in losses on the data stream, as well).
   Several ways of limiting this burstiness have been proposed,
   including "Appropriate Byte Counting" (ABC) [ALL99], TCP transmit
   pacing at the sender, and ACK rate control within the network.

   ABC can lead to behavior that is less bursty than standard TCP,
   because the congestion window is opened by the number of bytes that
   have been successfully transfered to the receiver, giving more
   appropriate behavior for application protocols that initiate
   connections with relatively short packets. For SMTP, for instance,
   the client might send a short HELO packet, a short MAIL packet, one
   or more short RCPT packets, and a short DATA packet - followed by
   the entire mail body sent as maximum-length packets. ABC would not
   use ACKs for each of these short packets to increase the congestion
   window allowing additional full-length packets.

4.1 Achieving, and maintaining, large windows

   The recommendations described in this document will aid TCPs in
   injecting packets into ERRORed connections as fast as possible
   without destabilizing the Internet, and so optimizing the use of
   available bandwidth.

   In addition to these TCP-level recommendations, there is still
   additional work to do at the application level, especially with
   the dominant application protocol on the World Wide Web, HTTP.

   HTTP/1.0 (and its predecessor, HTTP/0.9) used TCP connection
   closing to signal a receiver that all of a requested resource
   had been transmitted. Because WWW objects tend to be small



Expires March 22, 2001                                         [Page 11]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   in size [MOGUL], TCPs carrying HTTP/1.0 traffic experience
   difficulty in "training" on available bandwidth (a substantial
   portion of the transfer had already happened, by the time the
   TCPs got out of slow start).

   Several HTTP modifications have been introduced to improve this
   interaction with TCP ("persistent connections" in HTTP/1.0,
   with improvements in HTTP/1.1 [RFC2616]). For a variety of
   reasons, many HTTP interactions are still HTTP/1.0-style -
   relatively short-lived.

   Proposals which reuse TCP congestion information across
   connections, like TCP Control Block Interdependence [RFC2140],
   or the more recent Congestion Manager [BS99] proposal, will have
   the effect of making multiple parallel connections impact the
   network as if they were a single connection, "trained" after
   a single startup transient. These proposals are critical to
   the long-term stability of the Internet, because today's users
   always have the choice of clicking on the "reload" button in
   their browsers and cutting off TCP's exponential backoff -
   replacing connections which are building knowledge of the
   available bandwidth with connections with no knowledge at all.


5.0 Acknowledgements

   This recommendation has grown out of RFC 2757, "TCP Over Long
   Thin Networks", which was in turn based on work done in the IETF
   TCPSAT working group. The authors are indebted to the active
   members of the PILC working group. In particular, Mark Allman
   gave us copious and insightful feedback. Also, Jamshid Mahdavi
   provided text replacements.


Changes

   Changes between versions 03 and 04:

   Other editorial changes and corrections.

   Changes between versions 02 and 03:

   Restructure document into discussion of standard mechanisms, work
   remaining to be done, and appendices on experimental mechanisms.

   Change "Explicit Corruption Notification" to "Explicit Transmission
   Error Notification", in order to avoid confusion with "Explicit
   Congestion Notification".



Expires March 22, 2001                                         [Page 12]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   Other editorial changes and corrections.

   Changes between versions 03 and 04:

   Incorporated lots of comments from mark allman to numerous to list
   here.

   Also incorporated some changes suggested by Jamshid Mahdavi.

   SACK-EXT is now approved for proposed. Reflected this change in
   status in the text by treating SACK-EXT in the same way as
   SACK.

   Changed section name from Delayed Duplicate Acknowledgements to
   "When TCP Defers Recovery to the Link Layer" and mentioned
   Reiner Ludwig's Eifel algorithm.

   Added reference to link-outage in the appendix.

   Changes between versions 04 and 05:

   Added section 1.3.


References

   [ALL99] Mark Allman, "TCP Byte Counting Refinements," ACM
   Computer Communication Review, Volume 29, Number 3, July 1999.
   http://www.acm.org/sigcomm/ccr/archive/1999/jul99/ccr-9907-allman.pdf

   [BBKVP96] Bakshi, B., P., Krishna, N., Vaidya, N., Pradhan, D.K.,
   "Improving Performance of TCP over Wireless Networks," Technical
   Report 96-014, Texas A&M University, 1996.

   [BPSK96] Balakrishnan, H., Padmanabhan, V., Seshan, S., Katz, R.,
   "A Comparison of Mechanisms for Improving TCP Performance over
   Wireless Links," in ACM SIGCOMM, Stanford, California, August
   1996.

   [BS99] Hari Balakrishnan, Srinivasan Seshan, "The Congestion
   Manager", July, 2000. Work in progress, available at
   http://www.ietf.org/internet-drafts/draft-ietf-ecm-cm-00.txt

   [BV97] Biaz, S., Vaidya, N., "Using End-to-end Statistics to
   Distinguish Congestion and Corruption Lossses: A Negative Result,"
   Texas A&M University, Technical Report 97-009, August 18, 1997.

   [BV98] Biaz, S., Vaidya, N., "Sender-Based heuristics for



Expires March 22, 2001                                         [Page 13]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   Distinguishing Congestion Losses from Wireless Transmission
   Losses," Texas A&M University, Technical Report 98-013, June
   1998.

   [BV98a] Biaz, S., Vaidya, N., "Discriminating Congestion Losses
   from Wireless Losses using Inter-Arrival Times at the Receiver,"
   Texas A&M University, Technical Report 98-014, June 1998.

   [HPF-CWV] Handley, M., Padhye, J., Floyd, S., "TCP Congestion
   Window Validation," March 2000. Approved for informational rfc,
   available at
   http://search.ietf.org/internet-drafts/draft-handley-tcp-cwv-02.txt.

   [LINK-OUTAGE] G. Montenegro, "Link Outage ICMP Notification,"
   July 2000.  Work in progress, available at
   http://www.ietf.org/internet-drafts/
   draft-montenegro-pilc-link-outage-00.txt

   [LK00] Reiner Ludwig and Randy Katz, "The Eifel Algorithm:
   Making TCP Robust Against Spurious Retransmissions, " ACM
   Computer Communication Review, Volume 30, number 1, January
   2000. Available at
   http://www.acm.org/sigcomm/ccr/archive/2000/jan00/
   ccr-200001-ludwig.pdf

   [MD95] Gabriel Montenegro and Steve Drach, "System Isolation and
   Network Fast-Fail Capability in Solaris," Second USENIX
   Symposium on Mobile and Location-Independent, April 1995.
   http://www.usenix.org/publications/library/proceedings/mob95/
     montenegro.html

   [MSMO97] M. Mathis, J. Semke, J. Mahdavi, T. Ott, "The Macroscopic
   Behavior of the TCP Congestion Avoidance Algorithm", Computer
   Communication Review, volume 27, number 3, July 1997. Available at
   http://www.acm.org/sigcomm/ccr/archive/1997/jul97/
     ccr-9707-mathis.html

   [MV97] Mehta, M., Vaidya, N., "Delayed Duplicate-Acknowledgements:
   A Proposal to Improve Performance of TCP on Wireless Links," Texas
   A&M University, December 24, 1997.
   Available at http://www.cs.tamu.edu/faculty/vaidya/mobile.html

   [PILC-LINK] Phil Karn, Aaron Falk, Joe Touch, Marie-Jose
   Montpetit, Jamshid Mahdavi, Gabriel Montenegro, Dan Grossman,
   Gorry Fairhurst, "Advice for Internet Subnetwork Designers",
   July 2000. Work in progress, available at http://
   www.ietf.org/internet-drafts/draft-ietf-pilc-link-design-03.txt




Expires March 22, 2001                                         [Page 14]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   [PILC-PEP] J. Border, M. Kojo, Jim Griner, G. Montenegro,
   "Performance Implications of Link-Layer Characteristics: Performance
   Enhancing Proxies", July 2000. Work in progress, available
   at http://www.ietf.org/internet-drafts/draft-ietf-pilc-pep-03.txt

   [PILC-SLOW] S. Dawkins, G. Montenegro, M. Kojo, V. Magret,
   "Performance Implications of Link-Layer Characteristics: Slow
   Links", July 2000. Work in progress, available at
   http://www.ietf.org/internet-drafts/draft-ietf-pilc-slow-04.txt

   [P-HTTP] "The Case for Persistent-Connection HTTP", Jeffrey
   C. Mogul, Research Report 95/4, May 1995, available as
   http://www.research.digital.com/wrl/techreports/abstracts/95.4.html

   [RFC793] Jon Postel, "Transmission Control Protocol", September 1981.
   RFC 793.

   [RFC1122] Braden, R., "Requirements for Internet Hosts --
   Communication Layers", October 1989. RFC 1122.

   [RFC1323] Van Jacobson, Robert Braden, and David Borman. "TCP
   Extensions for High Performance", May 1992. RFC 1323.

   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A.,
   "TCP Selective Acknowledgment Options," October, 1996.

   [RFC2140] J. Touch, "TCP Control Block Interdependence", RFC 2140,
   April 1997.

   [RFC2309] Braden, B. Clark, D., Crowcroft, J., Davie, B., Deering,
   S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge,
   C., Peterson, L., Ramakrishnan, K.K., Shenker, S., Wroclawski, J.,
   Zhang, L., "Recommendations on Queue Management and Congestion
   Avoidance in the Internet," RFC 2309, April 1998.

   [RFC2481] Ramakrishnan, K.K., Floyd, S., "A Proposal to add Explicit
   Congestion Notification (ECN) to IP", RFC 2481, January 1999.

   [RFC2488] Mark Allman, Dan Glover, Luis Sanchez. "Enhancing TCP
   Over Satellite Channels using Standard Mechanisms," RFC 2488
   (BCP 28), January 1999.

   [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion
   Control," April 1999. RFC 2581.

   [RFC2582] Floyd, S., Henderson, T., "The NewReno Modification to
   TCP's Fast Recovery Algorithm," April 1999. RFC 2582.




Expires March 22, 2001                                         [Page 15]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, Masinter,
   P. Leach, T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1",
   RFC 2616, June 1999. (Draft Standard)

   [SACK-EXT] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matthew
   Podolsky, Allyn Romanow, "An Extension to the Selective
   Acknowledgement (SACK) Option for TCP", August 1999. Approved
   for proposed standard, available at
   http://www.ietf.org/internet-drafts/draft-floyd-sack-00.txt

   [SF98] Nihal K. G. Samaraweera and Godred Fairhurst, "Reinforcement
   of TCP error Recovery for Wireless Communication", Computer
   Communication Review, volume 28, number 2, April 1998. Available at
   http://www.acm.org/sigcomm/ccr/archive/1998/apr98/
   ccr-9804-samaraweera.pdf

   [VJ-DCAC] Van Jacobson, "Dynamic Congestion Avoidance / Control"
   e-mail dated Feberuary 11, 1988, available from
   http://www.kohala.com/~rstevens/vanj.88feb11.txt

   [VMPM99] N. H. Vaidya, M. Mehta, C. Perkins, G. Montenegro,
   "Delayed Duplicate Acknowledgements: A TCP-Unaware Approach to
   Improve Performance of TCP over Wireless," Technical Report
   99-003, Computer Science Dept., Texas A&M University, February
   1999.

Authors' addresses

   Questions about this document may be directed to:

          Spencer Dawkins
          Fujitsu Network Communications
          2801 Telecom Parkway
          Richardson, Texas 75082

          Voice:  +1-972-479-3782
          E-Mail: spencer.dawkins@fnc.fujitsu.com














Expires March 22, 2001                                         [Page 16]


INTERNET DRAFT          PILC - Links with Errors          September 2000



          Gabriel E. Montenegro
          Sun Labs Networking and Security Group
          Sun Microsystems, Inc.
          901 San Antonio Road
          Mailstop UMPK 15-214
          Mountain View, California 94303

          Voice:  +1-650-786-6288
          Fax:    +1-650-786-6445
          E-Mail: gab@sun.com


          Markku Kojo
          University of Helsinki/Department of Computer Science
          P.O. Box 26 (Teollisuuskatu 23)
          FIN-00014 HELSINKI
          Finland

          Voice:  +358-9-7084-4179
          Fax:    +358-9-7084-4441
          E-Mail: kojo@cs.helsinki.fi


          Vincent Magret
          Corporate Research Center
          Alcatel Network Systems, Inc
          1201 Campbell
          Mail stop 446-310
          Richardson Texas 75081 USA
          M/S 446-310

          Voice:  +1-972-996-2625
          Fax:    +1-972-996-5902
          E-mail: vincent.magret@aud.alcatel.com


          Nitin Vaidya
          Dept. of Computer Science
          Texas A&M University
          College Station, TX 77843-3112

          Voice:  +1 409-845-0512
          Fax:    +1 409-847-8578
          Email: vaidya@cs.tamu.edu






Expires March 22, 2001                                         [Page 17]


INTERNET DRAFT          PILC - Links with Errors          September 2000


Appendix A: When TCP Defers Recovery to the Link Layer

   When link layers try aggressively to correct a high underlying
   error rate, it is imperative to prevent interaction between
   link-layer retransmission and TCP retransmission as these layers
   duplicate each other's efforts.  It may be preferable to allow a
   local mechanism to resolve a local problem, instead of invoking
   TCP's end-to-end mechanism and incurring the associated costs,
   both in terms of wasted bandwidth and in terms of its effect on
   TCP's window behavior.  In such an environment it may make sense
   to delay TCP's efforts so as to give the link-layer a chance to
   recover. With this in mind, the Delayed Dupacks [MV97, VMPM99]
   scheme selectively delays duplicate acknowledgements at the
   receiver.

   At this time, it is not well understood how long the receiver
   should delay the duplicate acknowledgments. In particular, the
   impact of medium access control (MAC) protocol on the
   choice of delay parameter needs to be studied. The MAC
   protocol may affect the ability to choose the appropriate
   delay (either statically or dynamically). In general,
   significant variabilities in link-level retransmission times
   can have an adverse impact on the performance of the Delayed
   Dupacks scheme.

   Delayed dupacks makes very little assumptions about the TCP
   implementations. If, however, one assumes that the
   implementations support TCP timestamps, then other schemes are
   possible. For example, the Eifel algorithm [LK00] uses timestamps
   (alternatively two of the currently four unused bits in the TCP
   header) to make TCP more robust in the face of spurious timeouts
   and packet re-orderings.

   Recommendation: Delaying duplicate acknowledgements and the
   Eifel Algorithm are not standards-track mechanisms. They may be
   useful in specific network topologies, but a general
   recommendation requires further research and experience.


Appendix B: Detecting Transmission Errors With Explicit Notifications

   As noted above, today's TCPs assume that any loss is due
   to congestion, and encounter difficulty in distinguishing
   between congestion loss and corruption loss because this
   "implicit notification" mechanism can't carry both meanings
   at once. [SF98] reports simulation results showing that
   performance improvements are possible when TCP can correctly
   distingush between losses due to congestion and losses due to



Expires March 22, 2001                                         [Page 18]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   corruption.

   With explicit notification from the network it is possible to
   determine when a loss is due to corruption. Several proposals
   along these lines include:

   - Explicit Loss Notification (ELN) [BPSK96]

   - Explicit Bad State Notification (EBSN) [BBKVP96]

   - Explicit Loss Notification to the Receiver (ELNR), and
     Explicit Delayed Dupack Activation Notification (EDDAN)
     [MV97]

   - Space Communication Protocol Specification - Transport
     Protocol (SCPS-TP), which uses explicit "negative
     acknowledgements" to notify the sender that a damaged
     packet has been received.

   Similarly to notifying about corruptions affecting specific
   packets, it is useful to inform of sustained interruptions in
   link connectivity. These conditions can be reported with an ICMP
   Host Unreachable message [LINK-OUTAGE].  IP is required to pass
   any such messages up to transport layers like UDP and TCP, and
   these, in turn, to applications above them [RFC1122].  What is
   not clearly defined is what code within an ICMP Host Unreachable
   message should be used to notify of an error condition.  For
   conditions of network outage, a currently unused 'host isolated'
   (code 8) was introduced for routers (actually, IMP's) to inform
   hosts of an outage.  Additionally, [MD95] argues for the
   application of 'host isolated' for notifications emanating from
   a host's lower layers.

   In summary, these notifications to upper layers can originate
   either from within a host itself or in another host altogether.
   ICMP includes the necessary information to determine the sender
   of the notification, as well as a part of the datagram which
   encountered the error

   These proposals offer promise, but none have been proposed as
   standards-track mechanisms for adoption in IETF.

   Recommendation: Researchers should continue to investigate true
   corruption-notification mechanisms, especially mechanisms like
   ELNR and EDDAN [MV97], in which the only systems that need to be
   modified are the base station and the mobile device. We also note
   that the requirement that the base station be able to examine TCP
   headers at link speeds raises performance issues with respect to



Expires March 22, 2001                                         [Page 19]


INTERNET DRAFT          PILC - Links with Errors          September 2000


   IPsec-encrypted packets.

Appendix C Appropriate Byte Counting [ALL99] (Experimental)

   Researchers have pointed out an interaction between delayed
   acknowledgements and TCP acknowledgement-based self-clocking, and
   various proposals have been made to improve bandwidth utilization
   during slow start. One proposal, called "Appropriate Byte Counting",
   increases cwnd based on the number of bytes acknowledged, instead of
   the number of ACKs received. This proposal is a refinement of earlier
   proposals, limits the increase in cwnd so that cwnd does not "spike"
   in the presence of "stretch ACKs", which cover more than two segments
   (whether this is intentional behavior by the receiver or the result
   of lost ACKs), and limits cwnd growth based on byte counting to the
   initial slow-start exchange.

   This proposal is still at the experimental stage, but implementors
   may wish to follow this work, because the effect is that the
   congestion window is opening more aggressively when ACKs are lost
   during the initial slow-start exchange, but this aggressiveness
   does not act to the detriment of other flows.






























Expires March 22, 2001                                         [Page 20]