TSVWG                                                         S. Dawkins
Internet-Draft                                               C. Williams
Expires: April 23, 2004                                        MCSR Labs
                                                        October 24, 2003


              End-to-end, Implicit "Link-Up" Notification
                  draft-dawkins-trigtran-linkup-01.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 23, 2004.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   The Performance Implications of Link Characteristics [PILC] working
   group is recommending an end-to-end implicit notification when an
   access link outage ends. This document codifies the "Link Up
   Notification" for TCP.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].







Dawkins & Williams       Expires April 23, 2004                 [Page 1]


Internet-Draft          "Link-Up" Notifications             October 2003


1. Introduction

   The Transmission Control Protocol (TCP) [RFC793] uses a
   retransmission timer to ensure data delivery in the absence of any
   feedback from a remote data receiver, and prescribes an "exponential
   backoff" for this timer in cases where retransmissions are also
   unacknowledged. This timer can grow to a very large value (the
   retransmission timer in deployed implementations is often capped at
   64 seconds, and even this limit isn't required by standards-track
   specifications).

   This exponential backoff is necessary to prevent sustained congestion
   (if loss occurs due to congestion), but may provide an unnecessarily
   unpleasant user experience (if the loss occurs due to link outages in
   a wireless environment).

   The Performance Implications of Link Characteristics [PILC] working
   group is recommending an end-to-end implicit notification when an
   access link outage ends [LINK, section 8.2]. The goal is to allow
   sending transports to retransmit in a timely fashion without
   modifying the exponential backoff mechanism. This notification was
   well-supported in the IETF 56 TRIGTRAN BoF [TRIGTRAN56].

   PILC is not chartered to propose protocol changes, so this proposal
   is targeted for the Transport Area Working Group (TSVWG).

   This note describes a method of "short-circuiting" a "backed-off"
   retransmission timer in a case where a TCP detects that a local
   interface has become operational, so that a sender is notified that
   another retransmission attempt may be appropriate. The TCP using the
   interface sends a "Link Up Notification" (or "LUN") to its peer.




















Dawkins & Williams       Expires April 23, 2004                 [Page 2]


Internet-Draft          "Link-Up" Notifications             October 2003


2. Problem Statement

   The Transmission Control Protocol (TCP) [RFC793] uses a
   retransmission timer to ensure data delivery in the absence of any
   feedback from a remote data receiver. This timer, called the
   retransmission timeout (RTO), is calculated using an algorithm
   specified in [RFC2988].

   When an RTO occurs, the sender retransmits an unacknowledged segment.
   If this retransmitted segment is also unacknowledged, the sender
   waits twice as long before attempting an additional retransmission,
   and this delay is cumulative for each successive retransmission that
   does not result in an acknowledgement from the receiver.

   The initial value of RTO is 3 seconds, and subsequent values during
   normal operation approach a smoothed average of the RTT (plus a
   factor based on the variance in RTT), with a lower bound of 1 second.
   When a segment is lost, and cannot be recovered by other means (Fast
   Retransmit), the RTO used to trigger the first retransmission attempt
   will be as short as is "reasonable" - the RTO is calculated based on
   the measured RTT, so the RTO will happen with a reasonable
   expectation that no acknowledgement for data sent before RTO will be
   received after RTO. This might be characterized as "as soon as
   possible, but no sooner".

   All well and good, if the retransmitted segment is acknowledged. If
   it is not acknowledged, the TCP will wait twice as long before
   retransmitting again, and will continue to double the RTO interval
   each time its attempt to retransmit fails.

   This behavior is conservative, ensuring that sending TCPs "back off"
   in the presence of path congestion. This desirable property comes at
   a price - current RTO values quickly increase into the 10s of seconds
   between retransmission attempts, a painfully slow interval if a human
   being is "in the loop". BSD-based TCPs finally "cap" the maximum RTO
   value at 64 seconds, but this "cap" is not required [RFC2988] -
   conformant TCPs are allowed to continue to increase RTO into multiple
   minutes between retransmission attempts.

   If an RTO has happened because of path congestion, high and rising
   RTO-based periods of "silence" are necessary to ensure that path
   congestion does not remain, or even increase, at a time when the
   sending TCP is not receiving any feedback from the receiver.

   If an RTO has happened because of an access link failure, an
   all-too-common situation when the access link is a wireless link, and
   the access link becomes available again, the unexpired portion of the
   full RTO period is not required to prevent sustained congestion,



Dawkins & Williams       Expires April 23, 2004                 [Page 3]


Internet-Draft          "Link-Up" Notifications             October 2003


   because no congestion was occurring. However, today's sending TCPs
   cannot know this is the case, have no indication that the RTO is
   caused by an access link failure, and must make the conservative
   assumption that lost packets are being lost due to congestion.

   It is near-axiomatic that a "human in the loop" will abandon any
   operation leading to minutes of inactivity and "try again" - for
   instance, pressing the "stop" and "reload" buttons on an HTTP
   browser. These operations often reset or abandon existing TCP
   connections, causing TCPs to discard learned path characteristics,
   and add additional packets (SYN/SYN-ACK on new connections, etc.) to
   the connection path. If it's possible to prevent this, it's desirable
   to do so.

2.1 A Historical Note: "Kicking" TCP

   The IETF PILC Working group is recommending retransmission of packets
   on an interface that has returned to operational status, in [LINK].
   [LINK] documents informal practice, but additional details are
   required for standards-track TCPs.

   "Kicking TCP" takes its name from Phil Karn's posting to the PILC
   mailing list, proposing that routers driving subnetworks subject to
   lengthy outages "try to hold onto the last IP packet of each flow
   when a link goes down and forward it to its destination when the link
   comes back up". [LINKNOTE].

   This document takes "Kicking TCP" as a starting point. It extends
   "Kicking TCP" by adding sender-side behavior for
   apparently-duplicated packets received on an RTOed TCP connection.

2.2 Transport and deployability Considerations

   Ideally, a "Link Up Notification" (or "LUN") would be accomplished
   using an ICMP message, but in today's Internet, an end-to-end TCP
   packet for an existing connection is more likely to "arrive" at its
   destination across border gateways, firewalls, and NATs. "Kicking
   TCP" takes advantage of this - the LUN is exactly a packet that has
   already been transmitted on an existing connection path.

2.3 Applicability Statement

   Hosts supporting TCP-based applications over subnetwork interfaces
   subject to multi-second outages MAY perform the actions described in
   Section 3. These actions are more attractive for TCP implementations
   used with "human-in-the-loop" applications, but are safe for any
   TCP-based implementation.




Dawkins & Williams       Expires April 23, 2004                 [Page 4]


Internet-Draft          "Link-Up" Notifications             October 2003


   All hosts supporting TCP-based applications SHOULD perform the
   actions described in Section 4.

















































Dawkins & Williams       Expires April 23, 2004                 [Page 5]


Internet-Draft          "Link-Up" Notifications             October 2003


3. When a Local Interface Returns to "UP"

   If a host contains a local interface that is subject to frequent and
   lengthy outages, the host subnetwork implementation MAY retain a copy
   of "the last" packet transmitted on each TCP connection.

   When the subnetwork implementation detects that a local interface has
   returned to "UP" status, the subnetwork implementation MAY retransmit
   the last packet stored for each TCP connection.

3.1 Layering Violation Tradeoffs

   This proposal casually acts like subnetwork implementations can track
   TCP connections between two end hosts. This is a layering violation.

   If an implementation finds it more convenient to provide "local link
   up" indications to its own TCP, LUN functionality can be implemented
   in the TCP/IP stack.

   Not all subnetwork implementations are able to distinguish between
   TCP connections. In this case, the subnetwork may chose to store one
   packet per destination host.

   TCP source and destination port numbers will be masked when the host
   is using IPSEC Encapsulating Secure Payload [ESP], because this
   cryptographic privacy mechanism obscures these fields from the TCP/IP
   "pseudo header". In these cases, the subnetwork may also choose to
   store one packet per destination host.

   If a host is storing one packet per destination host, it should be
   the most recently transmitted packet, to maximize the probability
   that a LUN will restart an active TCP connection.

3.2 Stopping the Babbling

   LUNs are intended as an end-to-end implicit notification to a peer
   TCP, not a reliable signal. If a LUN is also lost due to a new link
   outage, no additional LUNs will take place unless the local interface
   "cycles" again.

   Some subnetwork technologies can cycle between operational and
   non-operational status very rapidly. The authors have been informed
   of a scenario with more than 10 802.11 "link up" transitions per
   second in a private conversation [BAPC]. To prevent "LUN storms",
   hosts MUST wait at least one second (the minimum RTO value) after an
   interface becomes operational before sending a LUN.

   Modified hosts MUST not send LUNs more frequently than once every



Dawkins & Williams       Expires April 23, 2004                 [Page 6]


Internet-Draft          "Link-Up" Notifications             October 2003


   three seconds. This restriction matches the RTO period for a new TCP
   connection, so is assumed to be "safe enough".

















































Dawkins & Williams       Expires April 23, 2004                 [Page 7]


Internet-Draft          "Link-Up" Notifications             October 2003


4. When an RTOed TCP Sender Receives a LUN

   The LUN described in Section 3 will contain an acknowledgement
   sequence number, if the TCP connection has advanced to the
   ESTABLISHED state. There are several possibilities (using
   [RFC793]-style notation):

   1.  SND.NXT < SEG.ACK - in this case, the receiver has retransmitted
       an acknowledgement for a segment that hasn't been sent yet.

   2.  SND.UNA < SEG.ACK <= SND.NXT - in this case, the receiver has
       retransmitted a "new" ACK that the sender has not seen. The TCP
       would process this segment normally - it would remove the
       acknowledged segments from the retransmission queue and perform
       slow start (since the connection is already in RTO).

   3.  SEG.ACK <= SND.UNA - in this case, the receiver has retransmitted
       a "duplicate" ACK that the sender has seen previously. In today's
       standard-conformant TCPs, this segment would be ignored (the
       receiver would assume the ACK has been duplicated or reordered by
       the IP network). This memo adds the following TCP mechanism: for
       a connection in RETRANSMISSION-WAIT, the sending TCP SHOULD
       perform slow start.

   OPEN ISSUE: should we tighten the criteria for a LUN, so that we only
   respond to a LUN that duplicates the "most recent" ACK received? Our
   sense is that if we got an ACK before the link went inactive, we
   should expact to get that ACK again as a LUN when the link becomes
   active again, and not some earlier ACK (yes, IP networks can reorder
   packets, but during RTO, the sender sends only one packet into the
   network, and older packets shouldn't still be active in the network).
   But responding to earlier ACKs as LUNs wouldn't be much of a risk,
   because LUN has no effect except during RTO anyway.


















Dawkins & Williams       Expires April 23, 2004                 [Page 8]


Internet-Draft          "Link-Up" Notifications             October 2003


5. Security Considerations

   This memo describes a (small) change in TCP behavior - the most
   widely used transport protocol on the Internet today.

   The procedures defined in this memo will cause sending hosts to
   retransmit one packet per RTOed connection before RTO timers would
   have expired (when the sending host would have retransmitted one
   packet per connection anyway).

   The procedures defined in this memo may cause a TCP to "give up" on
   an RTOed connection more rapidly than it would have previously (for
   instance, modified BSD-derived sending TCPs may still abandon a TCP
   connection after 12 attempted retransmissions, but the 12
   retransmissions may take place over a shorter time interval if LUNs
   cause retransmissions to take place before the sender's RTO timer
   expires).

   It is possible to spoof LUNs. For this to work, an attacker would
   identify a TCP connection that has experienced RTO, and send a forged
   packet with appropriate addresses and port numbers, and reasonable
   sequence numbers, to the TCP sender. This seems like a lot of work to
   generate a single TCP segment retransmission followed by Slow Start
   (the effect of a LUN) - an attacker with this capability could simply
   start sending an ACK stream today, and cause more packets to enter
   the network.

   The authors assume that fully-backed-off TCP connections for
   interactive applications will often be abandoned anyway, resulting in
   additional traffic (SYN/SYN-ACKs, etc.), so that tiny increase in
   traffic of a single LUN would be outweighed by traffic avoidance in
   these situations.



















Dawkins & Williams       Expires April 23, 2004                 [Page 9]


Internet-Draft          "Link-Up" Notifications             October 2003


6. IANA Considerations

   There are no IANA considerations for this document.
















































Dawkins & Williams       Expires April 23, 2004                [Page 10]


Internet-Draft          "Link-Up" Notifications             October 2003


7. Acknowledgements

   We want to clearly acknowledge Phil Karn as the person who brought
   "Kicking TCP" to the PILC working group.

   We want to thank Mark Allman and Bernard Aboba for a number of
   helpful comments on previous variants of this discussion.


Authors' Addresses

   Spencer Dawkins
   MCSR Labs
   1547 Rivercrest Blvd.
   Allen, TX  75002
   US

   Phone: +1-972-727-9834
   EMail: spencer@mcsr-labs.org


   Carl Williams
   MCSR Labs
   3790 El Camino Real
   Palo Alto, CA  94306
   US

   Phone: +1-650-279-5903
   EMail: carlw@mcsr-labs.org






















Dawkins & Williams       Expires April 23, 2004                [Page 11]


Internet-Draft          "Link-Up" Notifications             October 2003


Appendix A. References

   [BAPC]: Bernard Aboba, private conversation at IETF 57

   [LINK]: "Advice for Internet Subnetwork Designers", Phil Karn
      (editor), February 2003 [draft-ietf-pilc-link-design-13.txt, work
      in progress]

   [LINKNOTE]: "Kicking TCP", posting on PILC mailing list by Phil Karn,
      March 7, 2000 [http://pilc.grc.nasa.gov/list/archive/0691.html]

   [PILC]: "Performance Implications of Link Characteristics", IETF
      Working group [http://www.ietf.org/html.charters/
      pilc-charter.html]

   [RFC793]: "Transmission Control Protocol", J. Postel, September, 1981
      [ftp://ftp.rfc-editor.org/in-notes/rfc793.txt]

   [RFC2119]: "Key words for use in RFCs to Indicate Requirement
      Levels", S. Bradner, March 1997 [ftp://ftp.rfc-editor.org/
      in-notes/rfc2119.txt]

   [RFC2988]: "Computing TCP's Retransmission Timer", V. Paxson, M.
      Allman, November, 2000 [ftp://ftp.rfc-editor.org/in-notes/
      rfc2988.txt]

   [TRIGTRAN56]: "Triggers for Transport (TRIGTRAN) BoF minutes", March,
      2003 [http://www.ietf.org/proceedings/03mar/minutes/trigtran.htm]























Dawkins & Williams       Expires April 23, 2004                [Page 12]


Internet-Draft          "Link-Up" Notifications             October 2003


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION



Dawkins & Williams       Expires April 23, 2004                [Page 13]


Internet-Draft          "Link-Up" Notifications             October 2003


   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.











































Dawkins & Williams       Expires April 23, 2004                [Page 14]