Network Working Group Reiner Ludwig
INTERNET-DRAFT Michael Meyer
Expires: August 2002 Ericsson Research
February, 2002
The Eifel Detection Algorithm for TCP
<draft-ietf-tsvwg-tcp-eifel-alg-03.txt>
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
The Eifel detection algorithm allows the TCP sender to detect a
posteriori whether it has entered loss recovery unnecessarily. It
already determines this in the beginning of loss recovery when the
first new ACK arrives after the timeout-based or fast retransmit has
been sent. The algorithm requires that the TCP Timestamps option
defined in RFC1323 is enabled for a connection. The idea is to use
the timestamps echoed in returning ACKs to eliminate the
retransmission ambiguity in TCP. The Eifel detection algorithm
provides a basis for future TCP enhancements. This includes response
algorithms to back out of loss recovery by restoring a TCP sender's
congestion control state.
Ludwig & Meyer [Page 1]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119].
We use the term 'new ACK' to refer to an ACK that acknowledges
previously unacknowledged data. We use the term 'duplicate ACK' as
defined in [WS95]. Furthermore, we refer to the first-time
transmission of a data segment as the 'original transmit'.
1. Introduction
The retransmission ambiguity problem [KP87] is the TCP sender's
inability to distinguish whether the first new ACK that arrives after
a retransmit was sent in response to the original transmit or the
retransmit. This problem occurs after a timeout-based retransmit and
after a fast retransmit. The Eifel detection algorithm uses the TCP
Timestamps option defined in RFC1323 to eliminate the retransmission
ambiguity. It thereby allows the TCP sender to detect a posteriori
whether it has entered loss recovery unnecessarily.
This added capability of the TCP sender is useful in environments
where TCP's loss recovery and congestion control algorithms may often
get falsely triggered. This can be caused by packet reordering,
packet duplication, the loss of a flight of ACKs, or a sudden delay
increase in the data or the ACK path that results in a spurious
timeout. For example, such sudden delay increases can often occur in
wide-area wireless access networks due to handovers, resource
preemption, or because the mobile transmitter traverses through a
radio coverage hole (e.g., see [Gu01]).
Based on the Eifel detection algorithm, the TCP sender may then
choose to implement dedicated response algorithms. One goal of such a
response algorithm would be to alleviate the consequences of a
falsely triggered loss recovery. This may include restoring the TCP
sender's congestion control state, and/or avoiding the often
unnecessary go-back-N retransmits that typically occur after a
spurious timeout. Another goal would be to adapt protocol parameters
such as the duplicate acknowledgement threshold [RFC2581], and/or the
RTT estimators [RFC2988]. This is to reduce the risk of falsely
triggering TCP's loss recovery again as the connection progresses.
However, such response algorithms are outside the scope of this
document. Note: The original proposal, the "Eifel algorithm" [LK00],
comprises both a detection and a response algorithm. This document
only defines the detection part.
A key feature of the Eifel detection algorithm is that it already
detects upon the first new ACK that arrives during loss recovery
whether a fast retransmit or a timeout was spurious. This is crucial
Ludwig & Meyer [Page 2]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
to be able to avoid the mentioned go-back-N retransmits. Another
feature is that the Eifel detection algorithm is fairly robust
against the loss of ACKs.
2. Events that Falsely Trigger TCP Loss Recovery
The following events falsely trigger a TCP sender's loss recovery and
congestion control algorithms. This causes a so-called spurious
retransmit, and an unnecessary reduction of the TCP sender's
congestion window and slow start threshold [RFC2581].
- Spurious timeout
- Packet reordering
- Packet duplication
The common understanding of a spurious timeout is a timeout that
would not have occurred had the sender "waited longer". This may be
caused by increased delay that suddenly occurs in the data and/or the
ACK path. That in turn might cause an ACK to arrive too late, i.e.,
only after the TCP sender's retransmission timer has expired. Note
that such a late ACK could either be a new ACK that would acknowledge
the oldest outstanding segment, or it could be the third duplicate
ACK that would trigger a fast retransmit of the oldest outstanding
segment [RFC2581]. For the purpose of specifying the algorithm in
Section 3, we define the former case as SPUR_TO_NEWACK, and the
latter as SPUR_TO_DUPACK. However, in this document we stretch the
definition of a spurious timeout. We also speak of a spurious timeout
when the timeout occurred because an entire flight of ACKs was lost
while in fact the original transmit of the oldest outstanding segment
had arrived at the TCP receiver. This case is also considered to fall
under the definition of SPUR_TO_NEWACK. Even though such a timeout is
certainly unavoidable, the triggering of loss recovery was spurious
since that original transmit was not lost. Hence, we use the term
'spurious timeout' in the sense that entering the associated loss
recovery was unnecessary. Yet, we exclude from the definition of a
spurious timeout the case where the retransmit arrives at the TCP
receiver before the corresponding original transmit.
Packet reordering in the network may occur because IP [RFC791] does
not guarantee in-order delivery of packets. Additionally, a TCP
receiver generates a duplicate ACK for each segment that arrives out-
of-order. This results in a spurious fast retransmit if three or more
data segments arrive out-of-order at the TCP receiver, and at least
three of the resulting duplicate ACKs arrive at the TCP sender. We
define such a case as SPUR_FR.
Packet duplication may occur because a receiving IP does not (cannot)
remove packets that have been duplicated in the network. A TCP
Ludwig & Meyer [Page 3]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
receiver in turn also generates a duplicate ACK for each duplicate
segment. As with packet reordering, this results in a spurious fast
retransmit if duplication of data segments or ACKs results in three
or more duplicate ACKs to arrive at the TCP sender. This case is also
considered to fall under the definition of SPUR_FR.
The negative impact on TCP performance caused by packet reordering
and packet duplication is commonly the same: a single spurious
retransmit (the fast retransmit), and the unnecessary halving of the
TCP sender's congestion window as a result of the subsequent fast
recovery phase [RFC2581].
The negative impact on TCP performance caused by a spurious timeout
is more severe. First, the timeout event itself causes a single
spurious retransmit, and unnecessarily forces the TCP sender into
slow start [RFC2581]. Then, as the connection progresses, a chain
reaction gets triggered that further decreases TCP's performance.
Since the timeout was spurious, at least some ACKs for original
transmits will typically arrive at the TCP sender before the ACK for
the retransmit arrives. (This is unless severe packet reordering
coincided with the spurious timeout in such a way that the ACK for
the retransmit is the first new ACK to arrive at the TCP sender.)
Those ACKs for original transmits will then trigger an implicit go-
back-N loss recovery at the TCP sender. Assuming that none of the
outstanding segments and none of the corresponding ACKs were lost,
all outstanding segments will get retransmitted unnecessarily.
Moreover, in some TCPs this may then cause a spurious fast
retransmit. This is because the spurious go-back-N retransmits will
arrive as duplicates at the TCP receiver, which in turn triggers a
series of duplicate ACKs. Note that this last spurious fast
retransmit could be avoided with the careful variant of 'bug fix'
[RFC2582].
More detailed explanations including TCP trace plots that visualize
the effects of spurious timeouts and packet reordering can be found
in [LK00].
3. The Eifel Detection Algorithm
3.1 The Idea
The goal of the Eifel detection algorithm is to allow the TCP sender
to detect a posteriori whether it has entered loss recovery
unnecessarily. Furthermore, the TCP sender should be able to make
this decision upon the first new ACK that arrives after the timeout-
based or fast retransmit has been sent. This in turn requires extra
information in every ACK by which the TCP sender can unambiguously
distinguish whether that first new ACK was sent in response to the
original transmit or the retransmit. Such extra information is
provided by the TCP Timestamps option [RFC1323]. Generally speaking,
Ludwig & Meyer [Page 4]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
timestamps are monotonously increasing "serial numbers" added into
every segment that are then echoed within the corresponding ACKs.
This is exploited by the Eifel detection algorithm in the following
way.
Given that timestamps are enabled for a connection the TCP sender
always stores the timestamp of the retransmit sent in the beginning
of loss recovery, i.e., the timestamp of the timeout-based or the
fast retransmit. If the timestamp of the first new ACK, arriving
after the mentioned retransmit was sent, is smaller then the stored
timestamp of that retransmit, then that ACK must have been sent in
response to an original transmit. Hence, the TCP sender must have
entered loss recovery unnecessarily.
The fact that the Eifel detection algorithm decides upon this first
new ACK is crucial to allow future response algorithms to avoid the
spurious go-back-N retransmits that typically occur after a spurious
timeout. Also, if loss recovery was entered unnecessarily, a window
worth of ACKs are outstanding that all carry a timestamp that is
smaller than the stored timestamp of the retransmit. The arrival of
any one of those ACKs suffices the Eifel detection algorithm to work.
Hence, the solution is fairly robust against ACK losses. Even the ACK
sent in response to the retransmit, i.e., the one that carries the
stored timestamp, may get lost.
Note: Also the DSACK option [RFC2883] can be used to detect a
posteriori whether the TCP sender has entered loss recovery
unnecessarily. However, the first ACK carrying a DSACK option usually
arrives at the TCP sender only after loss recovery has already
terminated. Thus, the DSACK option cannot be used to eliminate the
retransmission ambiguity. Consequently, it cannot be used to avoid
the mentioned spurious go-back-N retransmits. Moreover, a DSACK-based
detection algorithm is less robust against ACK losses.
3.2 The Algorithm
Given that the TCP Timestamps option [RFC1323] is enabled for a
connection, a TCP sender MAY use the Eifel detection algorithm as
defined in this subsection.
If the Eifel detection algorithm is used, the following steps MUST be
taken by the TCP sender only upon initiation of loss recovery, i.e.,
when either the timeout-based retransmit or fast retransmit is sent.
Note: The Eifel detection algorithm should not be (re-)initiated
after loss recovery has already started. In particular, it may not be
(re-)initiated upon subsequent timeouts for the same segment, and not
upon retransmitting segments other than the oldest outstanding
segment.
Ludwig & Meyer [Page 5]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
(1) Set a "SpuriousRecovery" variable to FALSE.
(2) Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option included
in the retransmit. A TCP sender must ensure that
RetransmitTS does not get overwritten as loss recovery
progresses, e.g., in case of a second timeout and
subsequent second retransmit of the same octet.
(3) Wait for the arrival of a valid ACK. If a valid ACK
arrives, then proceed to step (4).
(4) If a first or second duplicate ACK has arrived, then
return to step (3), else proceed to step (5).
(5) If a third duplicate ACK has arrived and the loss
recovery had been initiated with a timeout-based
retransmit, then set SpuriousRecovery to SPUR_TO_DUPACK
and proceed to step (DONE), else proceed to step (6).
(6) If a new ACK has arrived and the Timestamp Echo Reply
field of the ACK's Timestamps option is smaller than
RetransmitTS, then proceed to step (7), else proceed to
step (DONE).
(7) If the loss recovery had been initiated with a timeout-
based retransmit then set SpuriousRecovery to
SPUR_TO_NEWACK, else set SpuriousRecovery to SPUR_FR.
Proceed to step (DONE).
(DONE) No further processing.
The comparison operator "smaller than" in step (6) is conservative.
In theory, when the "timestamp clock" is slow or the network is fast,
RetransmitTS could at most be equal to the timestamp echoed by an ACK
sent in response to an original transmit. In that case, it is assumed
that the loss recovery was not falsely triggered.
3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts
Not all TCP implementations strictly follow RFC1323. There are
differences concerning which timestamp a TCP receiver echoes when a
duplicate segment arrives. The relevant case to consider in this
context is a spurious timeout that occurred because an entire flight
of ACKs got lost. (Recall the definition of a spurious timeout in
Section 2.) The question is which timestamp the TCP receiver echoes
in response to the spurious retransmit that typically arrives as a
duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX
kernel 2.4) will echo the timestamp of the last segment that arrived
Ludwig & Meyer [Page 6]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
in-sequence, while some non-RFC1323-compliant TCP receivers will echo
the timestamp of the retransmit.
The Eifel detection algorithm will only detect such spurious timeouts
in case the TCP receiver is RFC1323-compliant in this respect.
3.4 Protecting Against Misbehaving TCP Receivers
A TCP receiver can easily make a genuine retransmit appear to the TCP
sender as a spurious retransmit by forging echoed timestamps. This
may pose a security concern.
Fortunately, there is a way to modify the Eifel detection algorithm
in a way that makes it robust against lying TCP receivers. The idea
is to use timestamps as a "segment's secret" that the TCP receiver
only gets to know if it receives the segment. Conversely, a TCP
receiver will not know the timestamp of a segment that was lost.
Hence, to "prove" that it received the original transmit of a segment
that the TCP sender retransmitted, the TCP receiver would need to
return the timestamp of that original transmit. The Eifel detection
algorithm could then be modified to only decide that loss recovery
had been unnecessarily entered if the first new ACK echoes the
timestamp of the original transmit.
Hence, implementers may choose to implement the algorithm with the
following modifications.
Step (2) is replaced with step (2'):
(2') Set a "RetransmitTS" variable to the value of the
Timestamp Value field of the Timestamps option that was
included in the original transmit corresponding to the
retransmit. Note: This step requires that the TCP sender
stores the timestamps of all outstanding original
transmits.
Step (4) is replaced with step (4'):
(4') If a first or second duplicate ACK has arrived, then
proceed to step (DONE), else proceed to step (6).
Step (6) is replaced with step (6'):
(6') If a new ACK has arrived and the Timestamp Echo Reply
field of the ACK's Timestamps option is equal to
RetransmitTS, then proceed to step (7), else proceed to
step (DONE).
These modifications come at a cost. First, spurious timeouts of type
SPUR_TO_DUPACK cannot be detected any longer. Also, the modified
Ludwig & Meyer [Page 7]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
algorithm is fairly sensitive against ACK losses since it relies on
the arrival of the new ACK that corresponds to the original transmit.
Note: The first new ACK that arrives after loss recovery had been
unnecessarily entered, typically echoes the timestamp of the original
transmit. This assumes that the ACK corresponding to the original
transmit was not lost, and the TCP receiver does not forge timestamps
but complies with RFC1323. In case of a spurious fast retransmit,
this is implied by the rules for generating ACKs for data segments
that fill in all or part of a gap in the sequence space (see section
4.2 of [RFC2581]) and by the rules for echoing timestamps in that
case (see rule (C) in section 3.4 of [RFC1323]). In case of a
spurious timeout, it is likely that the delay that has caused the
spurious timeout has also caused the TCP receiver's delayed ACK timer
[RFC1122] to expire before the original transmit arrives. Also, in
this case the rules for generating ACKs and the rules for echoing
timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure that the
original transmit's timestamp is echoed.
A remaining problem is that a TCP receiver might guess a lost
segment's timestamp from observing the timestamps of segments that
arrived earlier. This could be avoided by having the TCP sender add a
"random counter" to the timestamp of every segment it sends, and then
increment that counter by a random value, e.g., between 1 and 100.
This ensures that timestamps remain monotonously increasing while
making it difficult for a TCP receiver to guess the timestamp of a
lost segment.
4. Security Considerations
There do not seem to be any security considerations associated with
the Eifel detection algorithm. This is because the Eifel detection
algorithm does not alter existing protocol state at the TCP sender
nor at the receiver. That is, no value of a state variable other than
one introduced by the algorithm itself (SpuriousRecovery, and
RetransmitTS) is changed.
Moreover, a variant of the Eifel detection algorithm has been
proposed in Section 3.4 that makes it robust against lying TCP
receivers.
Acknowledgments
Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
Sarolahti, and Alexey Kuznetsov for useful discussions that
contributed to this work.
Ludwig & Meyer [Page 8]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
References
[RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
RFC 2581, April 1999.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate
Requirement Levels, RFC 2119, March 1997.
[RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
Performance, RFC 1323, May 1992.
[RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
Fast Recovery Algorithm, RFC 2582, April 1999.
[RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
An Extension to the Selective Acknowledgement (SACK) Option
for TCP, RFC 2883, July 2000.
[Gu01] A. Gurtov, Effect of Delays on TCP Performance, In
Proceedings of IFIP Personal Wireless Communications,
August 2001.
[KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates
in Reliable Transport Protocols, In Proceedings of ACM
SIGCOMM 87.
[LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
Robust Against Spurious Retransmissions, ACM Computer
Communication Review, Vol. 30, No. 1, January 2000,
available at http://www.acm.org/sigcomm/ccr/archive/2000/
jan00/ccr-200001-ludwig.html (easier studied when
viewed/printed in color).
[RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
RFC 2988, November 2000.
[RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.
[RFC793] J. Postel, Transmission Control Protocol, RFC793, September
1981.
[WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
(The Implementation), Addison Wesley, January 1995.
Ludwig & Meyer [Page 9]
INTERNET-DRAFT TCP - Eifel Detection February, 2002
Author's Address
Reiner Ludwig
Ericsson Research (EED)
Ericsson Allee 1
52134 Herzogenrath, Germany
Phone: +49 2407 575 719
Fax: +49 2407 575 400
Email: Reiner.Ludwig@eed.ericsson.se
Michael Meyer
Ericsson Research (EED)
Ericsson Allee 1
52134 Herzogenrath, Germany
Phone: +49 2407 575 654
Fax: +49 2407 575 400
Email: Michael.Meyer@eed.ericsson.se
This Internet-Draft expires in August 2002.
Ludwig & Meyer [Page 10]