TSVWG S. Dawkins
Internet-Draft C. Williams
Expires: April 23, 2004 MCSR Labs
October 24, 2003
End-to-end, Implicit "Link-Up" Notification
draft-dawkins-trigtran-linkup-01.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 23, 2004.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
The Performance Implications of Link Characteristics [PILC] working
group is recommending an end-to-end implicit notification when an
access link outage ends. This document codifies the "Link Up
Notification" for TCP.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Dawkins & Williams Expires April 23, 2004 [Page 1]
Internet-Draft "Link-Up" Notifications October 2003
1. Introduction
The Transmission Control Protocol (TCP) [RFC793] uses a
retransmission timer to ensure data delivery in the absence of any
feedback from a remote data receiver, and prescribes an "exponential
backoff" for this timer in cases where retransmissions are also
unacknowledged. This timer can grow to a very large value (the
retransmission timer in deployed implementations is often capped at
64 seconds, and even this limit isn't required by standards-track
specifications).
This exponential backoff is necessary to prevent sustained congestion
(if loss occurs due to congestion), but may provide an unnecessarily
unpleasant user experience (if the loss occurs due to link outages in
a wireless environment).
The Performance Implications of Link Characteristics [PILC] working
group is recommending an end-to-end implicit notification when an
access link outage ends [LINK, section 8.2]. The goal is to allow
sending transports to retransmit in a timely fashion without
modifying the exponential backoff mechanism. This notification was
well-supported in the IETF 56 TRIGTRAN BoF [TRIGTRAN56].
PILC is not chartered to propose protocol changes, so this proposal
is targeted for the Transport Area Working Group (TSVWG).
This note describes a method of "short-circuiting" a "backed-off"
retransmission timer in a case where a TCP detects that a local
interface has become operational, so that a sender is notified that
another retransmission attempt may be appropriate. The TCP using the
interface sends a "Link Up Notification" (or "LUN") to its peer.
Dawkins & Williams Expires April 23, 2004 [Page 2]
Internet-Draft "Link-Up" Notifications October 2003
2. Problem Statement
The Transmission Control Protocol (TCP) [RFC793] uses a
retransmission timer to ensure data delivery in the absence of any
feedback from a remote data receiver. This timer, called the
retransmission timeout (RTO), is calculated using an algorithm
specified in [RFC2988].
When an RTO occurs, the sender retransmits an unacknowledged segment.
If this retransmitted segment is also unacknowledged, the sender
waits twice as long before attempting an additional retransmission,
and this delay is cumulative for each successive retransmission that
does not result in an acknowledgement from the receiver.
The initial value of RTO is 3 seconds, and subsequent values during
normal operation approach a smoothed average of the RTT (plus a
factor based on the variance in RTT), with a lower bound of 1 second.
When a segment is lost, and cannot be recovered by other means (Fast
Retransmit), the RTO used to trigger the first retransmission attempt
will be as short as is "reasonable" - the RTO is calculated based on
the measured RTT, so the RTO will happen with a reasonable
expectation that no acknowledgement for data sent before RTO will be
received after RTO. This might be characterized as "as soon as
possible, but no sooner".
All well and good, if the retransmitted segment is acknowledged. If
it is not acknowledged, the TCP will wait twice as long before
retransmitting again, and will continue to double the RTO interval
each time its attempt to retransmit fails.
This behavior is conservative, ensuring that sending TCPs "back off"
in the presence of path congestion. This desirable property comes at
a price - current RTO values quickly increase into the 10s of seconds
between retransmission attempts, a painfully slow interval if a human
being is "in the loop". BSD-based TCPs finally "cap" the maximum RTO
value at 64 seconds, but this "cap" is not required [RFC2988] -
conformant TCPs are allowed to continue to increase RTO into multiple
minutes between retransmission attempts.
If an RTO has happened because of path congestion, high and rising
RTO-based periods of "silence" are necessary to ensure that path
congestion does not remain, or even increase, at a time when the
sending TCP is not receiving any feedback from the receiver.
If an RTO has happened because of an access link failure, an
all-too-common situation when the access link is a wireless link, and
the access link becomes available again, the unexpired portion of the
full RTO period is not required to prevent sustained congestion,
Dawkins & Williams Expires April 23, 2004 [Page 3]
Internet-Draft "Link-Up" Notifications October 2003
because no congestion was occurring. However, today's sending TCPs
cannot know this is the case, have no indication that the RTO is
caused by an access link failure, and must make the conservative
assumption that lost packets are being lost due to congestion.
It is near-axiomatic that a "human in the loop" will abandon any
operation leading to minutes of inactivity and "try again" - for
instance, pressing the "stop" and "reload" buttons on an HTTP
browser. These operations often reset or abandon existing TCP
connections, causing TCPs to discard learned path characteristics,
and add additional packets (SYN/SYN-ACK on new connections, etc.) to
the connection path. If it's possible to prevent this, it's desirable
to do so.
2.1 A Historical Note: "Kicking" TCP
The IETF PILC Working group is recommending retransmission of packets
on an interface that has returned to operational status, in [LINK].
[LINK] documents informal practice, but additional details are
required for standards-track TCPs.
"Kicking TCP" takes its name from Phil Karn's posting to the PILC
mailing list, proposing that routers driving subnetworks subject to
lengthy outages "try to hold onto the last IP packet of each flow
when a link goes down and forward it to its destination when the link
comes back up". [LINKNOTE].
This document takes "Kicking TCP" as a starting point. It extends
"Kicking TCP" by adding sender-side behavior for
apparently-duplicated packets received on an RTOed TCP connection.
2.2 Transport and deployability Considerations
Ideally, a "Link Up Notification" (or "LUN") would be accomplished
using an ICMP message, but in today's Internet, an end-to-end TCP
packet for an existing connection is more likely to "arrive" at its
destination across border gateways, firewalls, and NATs. "Kicking
TCP" takes advantage of this - the LUN is exactly a packet that has
already been transmitted on an existing connection path.
2.3 Applicability Statement
Hosts supporting TCP-based applications over subnetwork interfaces
subject to multi-second outages MAY perform the actions described in
Section 3. These actions are more attractive for TCP implementations
used with "human-in-the-loop" applications, but are safe for any
TCP-based implementation.
Dawkins & Williams Expires April 23, 2004 [Page 4]
Internet-Draft "Link-Up" Notifications October 2003
All hosts supporting TCP-based applications SHOULD perform the
actions described in Section 4.
Dawkins & Williams Expires April 23, 2004 [Page 5]
Internet-Draft "Link-Up" Notifications October 2003
3. When a Local Interface Returns to "UP"
If a host contains a local interface that is subject to frequent and
lengthy outages, the host subnetwork implementation MAY retain a copy
of "the last" packet transmitted on each TCP connection.
When the subnetwork implementation detects that a local interface has
returned to "UP" status, the subnetwork implementation MAY retransmit
the last packet stored for each TCP connection.
3.1 Layering Violation Tradeoffs
This proposal casually acts like subnetwork implementations can track
TCP connections between two end hosts. This is a layering violation.
If an implementation finds it more convenient to provide "local link
up" indications to its own TCP, LUN functionality can be implemented
in the TCP/IP stack.
Not all subnetwork implementations are able to distinguish between
TCP connections. In this case, the subnetwork may chose to store one
packet per destination host.
TCP source and destination port numbers will be masked when the host
is using IPSEC Encapsulating Secure Payload [ESP], because this
cryptographic privacy mechanism obscures these fields from the TCP/IP
"pseudo header". In these cases, the subnetwork may also choose to
store one packet per destination host.
If a host is storing one packet per destination host, it should be
the most recently transmitted packet, to maximize the probability
that a LUN will restart an active TCP connection.
3.2 Stopping the Babbling
LUNs are intended as an end-to-end implicit notification to a peer
TCP, not a reliable signal. If a LUN is also lost due to a new link
outage, no additional LUNs will take place unless the local interface
"cycles" again.
Some subnetwork technologies can cycle between operational and
non-operational status very rapidly. The authors have been informed
of a scenario with more than 10 802.11 "link up" transitions per
second in a private conversation [BAPC]. To prevent "LUN storms",
hosts MUST wait at least one second (the minimum RTO value) after an
interface becomes operational before sending a LUN.
Modified hosts MUST not send LUNs more frequently than once every
Dawkins & Williams Expires April 23, 2004 [Page 6]
Internet-Draft "Link-Up" Notifications October 2003
three seconds. This restriction matches the RTO period for a new TCP
connection, so is assumed to be "safe enough".
Dawkins & Williams Expires April 23, 2004 [Page 7]
Internet-Draft "Link-Up" Notifications October 2003
4. When an RTOed TCP Sender Receives a LUN
The LUN described in Section 3 will contain an acknowledgement
sequence number, if the TCP connection has advanced to the
ESTABLISHED state. There are several possibilities (using
[RFC793]-style notation):
1. SND.NXT < SEG.ACK - in this case, the receiver has retransmitted
an acknowledgement for a segment that hasn't been sent yet.
2. SND.UNA < SEG.ACK <= SND.NXT - in this case, the receiver has
retransmitted a "new" ACK that the sender has not seen. The TCP
would process this segment normally - it would remove the
acknowledged segments from the retransmission queue and perform
slow start (since the connection is already in RTO).
3. SEG.ACK <= SND.UNA - in this case, the receiver has retransmitted
a "duplicate" ACK that the sender has seen previously. In today's
standard-conformant TCPs, this segment would be ignored (the
receiver would assume the ACK has been duplicated or reordered by
the IP network). This memo adds the following TCP mechanism: for
a connection in RETRANSMISSION-WAIT, the sending TCP SHOULD
perform slow start.
OPEN ISSUE: should we tighten the criteria for a LUN, so that we only
respond to a LUN that duplicates the "most recent" ACK received? Our
sense is that if we got an ACK before the link went inactive, we
should expact to get that ACK again as a LUN when the link becomes
active again, and not some earlier ACK (yes, IP networks can reorder
packets, but during RTO, the sender sends only one packet into the
network, and older packets shouldn't still be active in the network).
But responding to earlier ACKs as LUNs wouldn't be much of a risk,
because LUN has no effect except during RTO anyway.
Dawkins & Williams Expires April 23, 2004 [Page 8]
Internet-Draft "Link-Up" Notifications October 2003
5. Security Considerations
This memo describes a (small) change in TCP behavior - the most
widely used transport protocol on the Internet today.
The procedures defined in this memo will cause sending hosts to
retransmit one packet per RTOed connection before RTO timers would
have expired (when the sending host would have retransmitted one
packet per connection anyway).
The procedures defined in this memo may cause a TCP to "give up" on
an RTOed connection more rapidly than it would have previously (for
instance, modified BSD-derived sending TCPs may still abandon a TCP
connection after 12 attempted retransmissions, but the 12
retransmissions may take place over a shorter time interval if LUNs
cause retransmissions to take place before the sender's RTO timer
expires).
It is possible to spoof LUNs. For this to work, an attacker would
identify a TCP connection that has experienced RTO, and send a forged
packet with appropriate addresses and port numbers, and reasonable
sequence numbers, to the TCP sender. This seems like a lot of work to
generate a single TCP segment retransmission followed by Slow Start
(the effect of a LUN) - an attacker with this capability could simply
start sending an ACK stream today, and cause more packets to enter
the network.
The authors assume that fully-backed-off TCP connections for
interactive applications will often be abandoned anyway, resulting in
additional traffic (SYN/SYN-ACKs, etc.), so that tiny increase in
traffic of a single LUN would be outweighed by traffic avoidance in
these situations.
Dawkins & Williams Expires April 23, 2004 [Page 9]
Internet-Draft "Link-Up" Notifications October 2003
6. IANA Considerations
There are no IANA considerations for this document.
Dawkins & Williams Expires April 23, 2004 [Page 10]
Internet-Draft "Link-Up" Notifications October 2003
7. Acknowledgements
We want to clearly acknowledge Phil Karn as the person who brought
"Kicking TCP" to the PILC working group.
We want to thank Mark Allman and Bernard Aboba for a number of
helpful comments on previous variants of this discussion.
Authors' Addresses
Spencer Dawkins
MCSR Labs
1547 Rivercrest Blvd.
Allen, TX 75002
US
Phone: +1-972-727-9834
EMail: spencer@mcsr-labs.org
Carl Williams
MCSR Labs
3790 El Camino Real
Palo Alto, CA 94306
US
Phone: +1-650-279-5903
EMail: carlw@mcsr-labs.org
Dawkins & Williams Expires April 23, 2004 [Page 11]
Internet-Draft "Link-Up" Notifications October 2003
Appendix A. References
[BAPC]: Bernard Aboba, private conversation at IETF 57
[LINK]: "Advice for Internet Subnetwork Designers", Phil Karn
(editor), February 2003 [draft-ietf-pilc-link-design-13.txt, work
in progress]
[LINKNOTE]: "Kicking TCP", posting on PILC mailing list by Phil Karn,
March 7, 2000 [http://pilc.grc.nasa.gov/list/archive/0691.html]
[PILC]: "Performance Implications of Link Characteristics", IETF
Working group [http://www.ietf.org/html.charters/
pilc-charter.html]
[RFC793]: "Transmission Control Protocol", J. Postel, September, 1981
[ftp://ftp.rfc-editor.org/in-notes/rfc793.txt]
[RFC2119]: "Key words for use in RFCs to Indicate Requirement
Levels", S. Bradner, March 1997 [ftp://ftp.rfc-editor.org/
in-notes/rfc2119.txt]
[RFC2988]: "Computing TCP's Retransmission Timer", V. Paxson, M.
Allman, November, 2000 [ftp://ftp.rfc-editor.org/in-notes/
rfc2988.txt]
[TRIGTRAN56]: "Triggers for Transport (TRIGTRAN) BoF minutes", March,
2003 [http://www.ietf.org/proceedings/03mar/minutes/trigtran.htm]
Dawkins & Williams Expires April 23, 2004 [Page 12]
Internet-Draft "Link-Up" Notifications October 2003
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive
Director.
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
Dawkins & Williams Expires April 23, 2004 [Page 13]
Internet-Draft "Link-Up" Notifications October 2003
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Dawkins & Williams Expires April 23, 2004 [Page 14]