TCPM Working Group S. Schuetz
Internet-Draft NEC
Intended status: Standards Track L. Eggert
Expires: September 6, 2007 Nokia
W. Eddy
Verizon
Y. Swami
K. Le
Nokia
March 5, 2007
TCP Response to Lower-Layer Connectivity-Change Indications
draft-schuetz-tcpm-tcp-rlci-01
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
This document may not be modified, and derivative works of it may not
be created, except to publish it as an RFC and to translate it into
languages other than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 6, 2007.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Schuetz, et al. Expires September 6, 2007 [Page 1]
Internet-Draft TCP Response to Connectivity Indications March 2007
Abstract
When the path characteristics between two hosts change abruptly, TCP
can experience significant delays before resuming transmission in an
efficient manner or TCP can behave unfairly to competing traffic.
This document describes TCP extensions that improve transmission
behavior in response to advisory, lower-layer connectivity-change
indications. The proposed TCP extensions modify the local behavior
of TCP and introduce a new TCP option to signal locally received
connectivity-change indications to remote peers. Performance gains
result from a more efficient transmission behavior and there is no
difference in aggressiveness in comparison to a freshly-started
connection.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4
3. Connectivity-Change Indications . . . . . . . . . . . . . . . 6
4. TCP Response to Connectivity-Change Indications . . . . . . . 7
4.1. Connectivity-Change Indication TCP Option . . . . . . . . 8
4.2. Generation and Processing of Connectivity-Change
Indication TCP Options . . . . . . . . . . . . . . . . . . 9
4.3. Re-Probing Path Characteristics . . . . . . . . . . . . . 13
4.4. Speculative Retransmission . . . . . . . . . . . . . . . . 14
5. Security Considerations . . . . . . . . . . . . . . . . . . . 14
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.1. Normative References . . . . . . . . . . . . . . . . . . . 15
8.2. Informative References . . . . . . . . . . . . . . . . . . 16
Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A. Background: Classification of Connectivity
Disruptions . . . . . . . . . . . . . . . . . . . . . 18
A.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 20
A.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 21
Appendix B. Document Revision History . . . . . . . . . . . . . . 24
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
Intellectual Property and Copyright Statements . . . . . . . . . . 26
Schuetz, et al. Expires September 6, 2007 [Page 2]
Internet-Draft TCP Response to Connectivity Indications March 2007
1. Introduction
The Transmission Control Protocol (TCP) [RFC0793] generally assumes
that the end-to-end path between two hosts has characteristics that
are relatively stable over the lifetime of a connection. Although
TCP's congestion control algorithms [RFC2581] can adapt to changes to
the path characteristics after several round-trip times, they fail to
support efficient operation in the few round-trip times immediately
after a significant path change. This is due to the granularity of
TCP's sampling mechanisms. Significant changes to path connectivity
include loss or reestablishment of connectivity, and drastic, abrupt
changes in round-trip time (RTT) or available bandwidth.
Connectivity changes that occur on such short time-scales are
becoming more common, due to host mobility or intermittent network
attachment.
This document describes a set of complementary TCP extensions that
improve behavior when transmitting over paths whose characteristics
can change on short time-scales. TCP implementations that support
these extensions respond to receiving generic, link-technology-
independent, per-connection "path characteristics have changed" (or
short: "connectivity-change") indications from lower layers. A
connectivity-change indication signals that the characteristics of
the end-to-end path between the local node and its peer have changed
in some undefined way. The response mechanisms proposed for TCP act
on this information in a conservative fashion. The specific response
depends on the state of a connection.
It is important to note that this addition of response mechanisms to
lower-layer information is following an established precedent. TCP
and other transport protocols already react to information and
signals from lower layers; the proposed connectivity-change
indications thus extend an established interface between layers in
the protocol stack. TCP measures the end-to-end path to implicitly
derive network-layer information. TCP also directly reacts to
network-layer signals delivered via ICMP, for example, "Port
Unreachable" or the now-deprecated "Source Quench" [RFC1122].
Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start
[I-D.ietf-tsvwg-quickstart] are other sources of network-layer
information for which response mechanisms for TCP have been defined.
Connectivity-change indications are yet another source of lower-layer
information that TCP can use to improve its operation.
A second important point to note is that the TCP response mechanisms
to connectivity-change indications are purely optional efficiency
improvements. In the absence of connectivity-change indications, a
TCP that implements these changes behaves identically to an
unmodified TCP. When lower layers provide connectivity-change
Schuetz, et al. Expires September 6, 2007 [Page 3]
Internet-Draft TCP Response to Connectivity Indications March 2007
indications that trigger the response mechanisms, they enhance TCP
operation based on the explicit lower-layer information that is
signaled. These response mechanisms do not increase the
aggressiveness of TCP.
Note that the IAB has recently described architectural issues of
"link indications" [I-D.iab-link-indications]. The authors feel that
this term is not quite accurate in this environment, because
transport mechanisms should remain link-technology-agnostic.
However, transport protocols have always acted on network-layer
information and signals, such as measured path characteristics or
ICMP-signaled conditions. Because of the growing proliferation of
shim layers between the traditional network and transport layers,
this document uses the term "lower-layer indication" to remain
independent of specific network or shim layers.
Note that it is currently an open question as to whether additional
lower-layer indications can provide further information to transport
protocols. Also, this document only describes response mechanisms
for TCP, although other transport protocols may benefit from similar
response mechanisms to react to connectivity-change indications.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Motivation and Overview
Several proposed network-layer extensions support host mobility,
including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP
[I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols
from mobility events and enable them to sustain established
connections across mobility events. However, the path
characteristics that established connections experience after a
mobility event may have changed drastically and on short time-scales.
Congestion control, RTT and path-MTU state gathered over an old path
before the move generally have no meaning for the new path. Because
TCP uses stale information when resuming transmission over the new
path, it can be either too aggressive or highly inefficient. Similar
conditions may be found when fail-overs occur for multihomed hosts
through the shim6 protocol. Some background on the types of
scenarios that the technology described in this document is designed
to work within are found in Appendix A.
TCP already forces a slow-start restart in some cases where the
network state becomes unknown, such as after an idle period or heavy
losses. A first part of the response specified in this document
Schuetz, et al. Expires September 6, 2007 [Page 4]
Internet-Draft TCP Response to Connectivity Indications March 2007
involves a similar return to initial slow-start state in response to
connectivity-change indications that are received while a connection
is transmitting in steady-state. Note that this behavior is more
conservative than the standard TCP response or lack of response.
Some performance gains with the proposed mechanisms are due to either
avoiding overloading the new path, which typically incurs an RTO, or
using slow-start to quickly detect new capacity far above the point
where steady-state had previously been near.
A second response component improves TCP operation in the presence of
temporary connectivity disruptions. These disruptions can occur
independently of mobility events and, for example, may be due to
insufficient wireless access coverage or nomadic computer use.
Connectivity disruptions can severely decrease TCP performance. The
main reason for this decrease is TCP's retransmission behavior after
a connectivity disruption [SCHUETZ]. TCP uses periodic
retransmission attempts in exponentially increasing intervals, which
can unnecessarily delay retransmissions after connectivity returns.
In the extreme case, TCP connections can even abort, if the
disruption is longer than the TCP "user timeout." (Connection aborts
are out of scope for this document but can be prevented by the TCP
User Timeout Option [I-D.ietf-tcpm-tcp-uto].)
This second response action executes when receiving a connectivity-
change indication while a connection is stalled in exponential back-
off. It improves TCP retransmission behavior after connectivity is
restored through an immediate speculative retransmission attempt
[footnote-1]. Similar to the first response component, the second
one also increases TCP performance through a more intelligent
transmission behavior that uses periods of connectivity more
efficiently. In comparison to startup of a new connection, it does
not cause significant amounts of additional traffic and it does not
change TCP's congestion control algorithms.
Finally, this draft specifies a third response component, which is a
new TCP option that notifies the connection's remote peer of a
connectivity-change event detected locally. This is useful because
connectivity-change indications typically require appropriate
responses at both ends of a connection, but may only be received or
detected by one end. The other parts of the response to a
connectivity-change indication are independent of the indication's
source (locally notified or remotely signaled) and depend only on the
specific indication and the state of the connection for which it was
received.
Schuetz, et al. Expires September 6, 2007 [Page 5]
Internet-Draft TCP Response to Connectivity Indications March 2007
3. Connectivity-Change Indications
The focus of this document is on specifying TCP response mechanisms
to lower-layer "path characteristics have changed" indications. This
section briefly describes how different network- and shim-layer
mechanisms underneath the transport layer may provide these
"connectivity-change" indications to TCP. This section is included
for clarification only; details on connectivity indication sources
are out of scope of this document.
When lower layers detect a connectivity-change event, they generate
corresponding connectivity-change indications. Lower-layer events
that could trigger such an indication include (but are not limited
to):
o the IP address of the local outbound interface used for a given
connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router
advertisements [RFC2460]
o link-layer connectivity of the local outbound interface used for a
given connection has changed, e.g., link-layer "link up" event
[I-D.ietf-dna-link-information]
o the local outbound interface used for a given connection has
changed, due to routing changes or link-layer connectivity changes
at other interfaces (including tunnel establishment or teardown,
e.g., in response to IKE events [RFC4306])
o a Mobile IP binding update has completed [RFC3775]
o a HIP readdressing update has completed [I-D.ietf-hip-mm]
o a path-change signal from the network has arrived (possible in
theory, depends on network capabilities)
o other notifications as defined by the IETF's Detecting Network
Attachment (DNA) working group have occurred
[I-D.ietf-dna-link-information]
Note that the list above only describes some potential sources for
connectivity-change events. Other sources exist, but the details on
when to generate such events are out of the scope of this document,
which focuses on the TCP response mechanisms when such events are
received.
Schuetz, et al. Expires September 6, 2007 [Page 6]
Internet-Draft TCP Response to Connectivity Indications March 2007
4. TCP Response to Connectivity-Change Indications
A TCP connection can receive a connectivity-change indication (CCI)
either from its local stack ("local CCI") or through a new
"connectivity-change indication TCP option" from its peer ("remote
CCI"). Section 4.1 specifies this new TCP option. In either case,
upon reception of a CCI, the TCP response mechanisms defined in this
document re-probe path characteristics or perform a speculative
retransmission, depending on whether the connection is currently
stalled in exponential back-off or transmitting in steady-state. A
connection is "stalled in exponential back-off", if at least one
segment was retransmitted due to an RTO expiration but has not been
ACK'ed yet.
The remainder of this section first defines the format of the new CCI
option in Section 4.1 and then describes the two TCP response
mechanisms triggered by receiving CCIs - re-probing path
characteristics and speculative retransmission - in Section 4.3 and
Section 4.4.
To implement the RLCI mechanism defined in this document, TCP
implementations MUST maintain five new state variables per TCP
connection [footnote-2]:
LOCAL_CCI_COUNT
Counts (modulo 8) the number of local CCIs received for a
connection. Starting from value 7, it is decremented on each
local CCI and after 0 wraps up to 7.
REMOTE_CCI_COUNT
Holds a copy of the last CCI counter value advertised by the peer
through a CCI TCP option. This is initialized to 7, and is
updated in response to remote CCIs according to the rules defined
in Section 4.2.
LOCAL_CCI_ACTIVE
Boolean flag, true if the local TCP stack is currently executing a
response mechanism after having received a local CCI, and false
otherwise.
REMOTE_CCI_ACTIVE
Boolean flag, true if the local TCP stack is currently executing a
response mechanism after having received a remote CCI, false
otherwise.
Schuetz, et al. Expires September 6, 2007 [Page 7]
Internet-Draft TCP Response to Connectivity Indications March 2007
REMOTE_CCI_SNDNXT
Retains a copy of SND.NXT [RFC0793] at the time the most recent
remote CCI was received.
4.1. Connectivity-Change Indication TCP Option
Connectivity-change indications (CCIs) are generally asymmetric,
i.e., they may occur or be detected by one end but not the other.
The basic idea behind the CCI TCP option is to signal the occurrence
of local CCIs to the other end, in order to allow it to respond
appropriately.
1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
+----------------+---------------+---+-----+-----+
| Kind = X | Length = 3 |RES| CNT | ECNT|
+----------------+---------------+---+-----+-----+
Figure 1: Format of the connectivity-change indication TCP option.
Figure 1 shows the format of the CCI TCP option. It contains these
fields:
Kind (8 bits)
The TCP option number X [RFC0793] allocated by IANA upon
publication of this document (see Section 6).
Length (8 bits)
Length of the TCP option in octets [RFC0793]; its value MUST be 3.
RES (2 bits)
Reserved bits. The sender SHOULD set these to zero and the
receiver MUST ignore them.
CNT (3 bits)
Current value of LOCAL_CCI_COUNT of the local end sending the
option.
ECNT (3 bits)
Echoed value of CNT, i.e., the value of CNT in the last CCI option
received from the other end.
The CCI TCP option contains a counter (CNT) that represents the
number of times each side has received local connectivity-change
indications. At the beginning of a connection, LOCAL_CCI_ACTIVE and
REMOTE_CCI_ACTIVE MUST be set to false. LOCAL_CCI_COUNT and
REMOTE_CCI_COUNT MUST be set to 7. REMOTE_CCI_SNDNXT MUST be set to
0.
Schuetz, et al. Expires September 6, 2007 [Page 8]
Internet-Draft TCP Response to Connectivity Indications March 2007
A host opening a connection includes a CCI option in its SYN segment
with the initial LOCAL_CCI_COUNT of 7 to advertise support for the
option. A host receiving a SYN MUST NOT include a CCI option in its
SYN-ACK unless it has received a CCI option in the corresponding SYN.
A host MUST NOT process any following CCI options unless one was
included in both the SYN and SYN-ACK.
After the SYN exchange, a host SHOULD send a CCI option only after
receiving a new local connectivity-change indication, or in response
to receiving a new CCI option from the other end. Section 4.3 and
Section 4.4 describe the processing rules in detail.
A host MUST include a CCI option in all outgoing segments whenever
LOCAL_CCI_ACTIVE is true or REMOTE_CCI_ACTIVE is true (or both). A
host MUST NOT include a CCI option in any segments whenever
LOCAL_CCI_ACTIVE is false and REMOTE_CCI_ACTIVE is false, i.e. the
host is not processing any connectivity-change indications. When
sending any CCI option, CNT MUST be set to the current
LOCAL_CCI_COUNT and ECNT MUST be set to the current REMOTE_CCI_COUNT.
4.2. Generation and Processing of Connectivity-Change Indication TCP
Options
Processing of a connectivity-change indication can be separated into
two parts:
1. Processing in "initiator" mode, i.e., when a host receives a
local CCI and forwards it to the other end through a CCI TCP
option.
2. Processing in "responder" mode, i.e., when a host that receives a
remote CCI in a CCI TCP option from the other end.
Section 4.2.1 and Section 4.2.2 describe the state machines at an
initiator and a responder, respectively. Note that a single host can
be both initiator and responder at the same time, if a local CCI and
a remote CCI happen to occur at the same time.
The following events, conditions and actions are used in the
definition of the two state machines:
Events:
E_LOCAL_CCI
Local end received a local CCI.
Schuetz, et al. Expires September 6, 2007 [Page 9]
Internet-Draft TCP Response to Connectivity Indications March 2007
E_REMOTE_CCI
Local end received information about a remote CCI, i.e., received
a TCP segment that includes a CCI TCP option.
E_NONE
Local end received a TCP segment that does not include a CCI TCP
option.
Conditions:
C_NEW_REMOTE_CCI
Received CCI option signals a new remote CCI, i.e., CNT !=
REMOTE_CCI_COUNT.
C_ECHOED_LOCAL_CCI
Received CCI option echoes the local CCI counter, i.e., ECNT ==
LOCAL_CCI_COUNT.
C_LOCAL_PROGRESS
Local end made progress since receiving the last remote CCI, i.e.,
ACK > REMOTE_CCI_SNDNXT.
Actions:
A_DECREMENT_LOCAL
Decrement LOCAL_CCI_COUNT, i.e., LOCAL_CCI_COUNT = LOCAL_CCI_COUNT
- 1. LOCAL_CCI_COUNT wraps from 0 to 7.
A_FORCE_SEND
Force transmission of a segment that MUST include a CCI option.
The segment can either be an outstanding retransmission, a new
data segment or a pure ACK.
A_UPDATE_REMOTE_COUNT
Update remote CCI counter according to received CCI option, i.e.,
set REMOTE_CCI_COUNTER = CNT.
A_UPDATE_SNDNXT
Store the segment number of the next data segment, i.e., set
REMOTE_CCI_SNDNXT = SND.NXT.
4.2.1. Initiator Mode Processing
This section describes the initiator mode processing of a TCP host
implementing RLCI. In initiator mode, a host needs to signal the
last received local CCI to its peer, until the peer echoes reception
of that CCI. Figure 2 shows the corresponding state machine.
Schuetz, et al. Expires September 6, 2007 [Page 10]
Internet-Draft TCP Response to Connectivity Indications March 2007
At the beginning of a connection, i.e., before the first local CCI is
received, LOCAL_CCI_ACTIVE is false. This remains the case until the
local end receives a local CCI (E_LOCAL_CCI). When that happens, it
decrements LOCAL_CCI_COUNT (A_DECREMENT_LOCAL), forces a segment to
be sent to the peer (A_FORCE_SEND) and LOCAL_CCI_ACTIVE becomes true.
Note that this also implies that all subsequent outgoing segments
MUST contain a CCI TCP option until LOCAL_CCI_ACTIVE is false (and
possibly until REMOTE_CCI_ACTIVE is false, in case it became true
during the local CCI processing).
E_LOCAL_CCI =>
A_DECREMENT_LOCAL
A_FORCE_SEND
+-------------------------+ +-----+
| | | |
| V V |
+------------------+ +------------------+ |
| LOCAL_CCI_ACTIVE | | LOCAL_CCI_ACTIVE | |
| == false | | == true | |
+------------------+ +------------------+ |
^ ^ | | | |
| | | | | |
| +---------------------+ | ------+
| E_NONE | E_LOCAL_CCI =>
| | A_DECREMENT_LOCAL
+-------------------------+ A_FORCE_SEND
E_REMOTE_CCI &&
C_ECHOED_LOCAL_CCI
Figure 2: State machine for initiator processing.
When receiving a local CCI (E_LOCAL_CCI) while LOCAL_CCI_ACTIVE is
true, a host remains in this state but needs to perform the actions
A_DECREMENT_LOCAL and A_FORCE_SEND. LOCAL_CCI_ACTIVE remains true
until a host receives a segment carrying the CCI TCP option
(E_REMOTE_CCI) that echoes the current LOCAL_CCI_COUNT in the ECNT
field of the option (C_ECHOED_LOCAL_CCI). In this case,
LOCAL_CCI_ACTIVE becomes false.
4.2.2. Responder Mode Processing
This section describes the responder mode processing of CCIs for a
TCP host implementing the CCI TCP option. In responder mode, a host
echoes the last received remote CCI to its peer, until it can be sure
that the peer correctly received the echo. Figure 3 shows the
Schuetz, et al. Expires September 6, 2007 [Page 11]
Internet-Draft TCP Response to Connectivity Indications March 2007
corresponding state machine.
At the beginning of a connection, REMOTE_CCI_ACTIVE is false, i.e.,
the local host is not processing any remote CCIs. When it receives a
TCP segment with a CCI TCP option (E_REMOTE_CCI) signaling a new
remote CCI (C_NEW_REMOTE_CCI), it updates REMOTE_CCI_COUNT with the
value of the CNT field in the received option
(A_UPDATE_REMOTE_COUNT), stores the segment number of the next data
segment in REMOTE_CCI_SNDNXT (A_UPDATE_SNDNXT) and sets
REMOTE_CCI_ACTIVE to true. Note that this also implies that all
subsequent outgoing segments MUST contain a CCI TCP option until
REMOTE_CCI_ACTIVE is false (and possibly until LOCAL_CCI_ACTIVE is
false, in case it became true during the remote CCI processing).
E_REMOTE_CCI &&
C_NEW_REMOTE_CCI == true =>
A_UPDATE_REMOTE_COUNT
A_UPDATE_SNDNXT
+-------------------------+ +-----+
| | | |
| V V |
+-------------------+ +-------------------+ |
| REMOTE_CCI_ACTIVE | | REMOTE_CCI_ACTIVE | |
| == false | | == true | |
+-------------------+ +-------------------+ |
^ ^ | | | |
| | | | | |
| +---------------------+ | ------+
| E_NONE | E_REMOTE_CCI &&
| | C_NEW_REMOTE_CCI == true =>
+-------------------------+ A_UPDATE_REMOTE_COUNT
E_REMOTE_CCI && A_UPDATE_SNDNXT
C_NEW_REMOTE_CCI == false &&
C_LOCAL_PROGRESS
Figure 3: State machine for responder processing.
When a host where REMOTE_CCI_ACTIVE is true receives a remote CCI TCP
option (E_REMOTE_CCI) that signals a new remote CCI
(C_NEW_REMOTE_CCI), it updates REMOTE_CCI_COUNT with the value of the
CNT field in the received option (A_UPDATE_REMOTE_COUNT), stores the
segment number of the next data segment in REMOTE_CCI_SNDNXT
(A_UPDATE_SNDNXT) and leaves REMOTE_CCI_ACTIVE set to true.
A host sets REMOTE_CCI_ACTIVE to false only in one of the following
Schuetz, et al. Expires September 6, 2007 [Page 12]
Internet-Draft TCP Response to Connectivity Indications March 2007
two cases. First, if it receives a TCP segment that does not include
a a CCI TCP option (E_NONE), because this signals that
LOCAL_CCI_ACTIVE is false at the other end from which it can conclude
that the other end has completed processing of the CCI. Second, if
it receives a CCI TCP option (E_REMOTE_CCI) that does not signal a
new remote CCI (C_NEW_REMOTE_CCI == false) and the connection has
made progress since the last remote CCI (C_LOCAL_PROGRESS). In this
case, data segments sent after the last remote CCI have already been
ACK'ed, i.e., the peer must have received the echoed ECNT value in at
least one of the segments sent since the last remote CCI, because a
full round-trip of CCI option has completed. Therefore, the local
host can terminate responder mode processing.
Note: The second transition is required for the case when both hosts
are in responder mode at the same time. Neither will stop including
CCI TCP options in their segments, because REMOTE_CCI_ACTIVE is true
on both sides. This can happen, e.g., when both hosts receive local
CCIs at (nearly) the same time and signal it to each other using a
CCI TCP option.
4.3. Re-Probing Path Characteristics
When a TCP connection receives a connectivity-change indication and
is not currently stalled in exponential back-off, it MUST re-probe
the path characteristics to prevent causing congestion by
transmitting based on stale path state. In principle, this occurs
similar to the initial slow-start: The sender MUST NOT transmit more
than the default initial window (INIT_WINDOW) of data after a CCI is
received and MUST reset the congestion control state (CWND and
SS_THRESH), round-trip time measurement (RTTM) state, and RTO timer
as if this were a new connection [RFC2581][RFC2988]. If case Path
MTU Discovery (PMTUD) is activated, PMTUD state MUST also be reset
[RFC1191][RFC1981][I-D.ietf-pmtud-method].
One difference to an initial slow-start is that after a CCI, the
connection may have segments in flight towards the destination along
a previous path. Therefore, after a CCI, congestion control MUST
ignore any stale ACKs received and MUST update the congestion window
solely based on ACKs for data that was sent before a CCI was
received. Each ACK that is received while the host is processing any
CCI SHOULD be treated as a stale ACK, i.e., each ACK received for
data sent while LOCAL_CCI_ACTIVE was false or REMOTE_CCI_ACTIVE was
false is a stale ACK. In practice, a decent heuristic to
disambiguate stale and fresh ACKs is that all ACKs received while
either LOCAL_CCI_ACTIVE or REMOTE_CCI_ACTIVE are true are considered
stale. This works assuming there is only little large-scale
reordering, because the packet that triggers the local state machine
back into an inactive state will generally be received after all
Schuetz, et al. Expires September 6, 2007 [Page 13]
Internet-Draft TCP Response to Connectivity Indications March 2007
stale packets. In some scenarios this assumption may not hold, but
it seems reasonable for the vast majority of scenarios where the
stale path is cleared of packets in less time than one or two RTTs on
the new path.
For each stale ACK received, a host MUST NOT adjust the congestion
window and MUST NOT send any new data into the network. This SHOULD
continue until both LOCAL_CCI_ACTIVE and REMOTE_CCI_ACTIVE are false
or there is a timeout. When that occurs, the sender should consider
any un-ACK'ed segments below the highest received ACK as lost and
discount them from the segments in flight. The sender MUST use slow-
start based loss recovery for these segments.
4.4. Speculative Retransmission
The basic idea behind the speculative retransmission is to allow TCP
to resume stalled connections as soon as it receives an indication
that connectivity to previously unreachable peers may have returned.
When a TCP connection receives a connectivity-change indication -
either from the local stack or in a connectivity-change TCP option
from the peer - and is currently stalled, it MUST immediately
initiate the standard retransmission procedure, just as if the RTO
for the connection had expired.
5. Security Considerations
The only foreseen security considerations with the techniques
presented in this document result from either an attacker's ability
to spoof valid TCP segments with options that seemingly indicate
connectivity changes, or an attacker's ability to generate bogus
connectivity change indications locally. An attacker might produce a
stream of such false indicators that could keep a connection in slow-
start at the initial window. One possible defense against this type
of attack is to rate-limit the response to connectivity indicators
(whether local or remote). This is also probably less serious than
other attacks such an empowered adversary could perform, like
resetting the connection or injecting data. A similar effect could
be achieved without the new option by forging duplicate ACKs that
would keep a sender in loss recovery. If both sets of IP addresses,
port numbers, and sequence numbers are guessable for a connection,
then the connection should use an approved means (such as IPsec)
[I-D.ietf-tcpm-tcp-antispoof] for protection against spoofed
segments.
Schuetz, et al. Expires September 6, 2007 [Page 14]
Internet-Draft TCP Response to Connectivity Indications March 2007
6. IANA Considerations
This section is to be interpreted according to [RFC2434].
This document does not define any new namespaces. It uses an 8-bit
TCP option number maintained by IANA at
http://www.iana.org/assignments/tcp-parameters. IANA is requested to
assign a new TCP option number upon publication of this document.
7. Acknowledgments
This draft combines and obsoletes [I-D.swami-tcp-lmdr] and
[I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to
thank Mark Allman, Marcus Brunner, Shashikant Maheshwari, Kacheong
Poon, Juergen Quittek, Stefan Schmid and Joe Touch for their comments
and suggestions on the two previous drafts.
Simon Schuetz is partly funded by Ambient Networks, a research
project supported by the European Commission under its Sixth
Framework Program.
Wesley Eddy's work on this document was performed at NASA's Glenn
Research Center, while in support of the NASA Space Communications
Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future
Communications Study (FCS).
8. References
8.1. Normative References
[I-D.ietf-pmtud-method]
Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", draft-ietf-pmtud-method-11 (work in progress),
December 2006.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990.
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, August 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
Schuetz, et al. Expires September 6, 2007 [Page 15]
Internet-Draft TCP Response to Connectivity Indications March 2007
[RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", BCP 26, RFC 2434,
October 1998.
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999.
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000.
8.2. Informative References
[DUKE] Duke, M., Henderson, T., and J. Meegan, "Experience with
``Link-UP Notification'' Over a Mobile Satellite Link",
ACM Computer Communication Review, Vol. 34, No. 3,
July 2004.
[EDDY] Eddy, W. and Y. Swami, "Adapting End-host Congestion
Control for Mobility", NASA Glenn Research Center
Technical Report, CR-2005-213838, July 2005.
[I-D.dawkins-trigtran-linkup]
Dawkins, S., "End-to-end, Implicit 'Link-Up'
Notification", draft-dawkins-trigtran-linkup-01 (work in
progress), October 2003.
[I-D.eggert-tcpm-tcp-retransmit-now]
Eggert, L., "TCP Extensions for Immediate
Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
(work in progress), June 2005.
[I-D.iab-link-indications]
Aboba, B., "Architectural Implications of Link
Indications", draft-iab-link-indications-10 (work in
progress), March 2007.
[I-D.ietf-dna-link-information]
Yegin, A., "Link-layer Event Notifications for Detecting
Network Attachments", draft-ietf-dna-link-information-06
(work in progress), February 2007.
[I-D.ietf-hip-mm]
Nikander, P., "End-Host Mobility and Multihoming with the
Host Identity Protocol", draft-ietf-hip-mm-04 (work in
progress), June 2006.
[I-D.ietf-tcpimpl-restart]
Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
Schuetz, et al. Expires September 6, 2007 [Page 16]
Internet-Draft TCP Response to Connectivity Indications March 2007
Slow-Start Restart After Idle",
draft-ietf-tcpimpl-restart-00 (work in progress),
March 1998.
[I-D.ietf-tcpm-tcp-antispoof]
Touch, J., "Defending TCP Against Spoofing Attacks",
draft-ietf-tcpm-tcp-antispoof-06 (work in progress),
February 2007.
[I-D.ietf-tcpm-tcp-uto]
Eggert, L. and F. Gont, "TCP User Timeout Option",
draft-ietf-tcpm-tcp-uto-04 (work in progress),
October 2006.
[I-D.ietf-tsvwg-quickstart]
Floyd, S., "Quick-Start for TCP and IP",
draft-ietf-tsvwg-quickstart-07 (work in progress),
October 2006.
[I-D.swami-tcp-lmdr]
Swami, Y., "Lightweight Mobility Detection and Response
(LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work
in progress), March 2006.
[KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context
Transfers in Mobile Networks", ACM Computer Communication
Review, Vol. 31, No. 5, October 2001.
[OTT] Ott, J. and D. Kutscher, "OTT Internet: IEEE 802.11b for
Automobile Users", Proc. Infocom 2004, March 2004.
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC2131] Droms, R., "Dynamic Host Configuration Protocol",
RFC 2131, March 1997.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001.
[RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344,
August 2002.
[RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support
Schuetz, et al. Expires September 6, 2007 [Page 17]
Internet-Draft TCP Response to Connectivity Indications March 2007
in IPv6", RFC 3775, June 2004.
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
Wood, "Advice for Internet Subnetwork Designers", BCP 89,
RFC 3819, July 2004.
[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
RFC 4306, December 2005.
[SCHUETZ] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
"Protocol Enhancements for Intermittently Connected
Hosts", ACM Computer Communication Review, Vol. 35, No. 3,
July 2005.
[SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimisation
for disconnecting networks", ACM Computer Communication
Review, Vol. 33, No. 5, October 2003.
Editorial Comments
[] The authors have heard the idea of triggering
retransmits based on connectivity events of directly-
connected links being attributed to Phil Karn ("kick"
operation in the KAQ9 TCP stack). A thread from the
PILC mailing list in 2000 discusses some thoughts on
this (http://www.isi.edu/pilc/list/archive/0691.html).
[] Although this specification introduces five new per-
connection state variables, a preliminary
implementation of an earlier revision of this mechanism
[I-D.swami-tcp-lmdr] only required around a hundred
lines of kernel code.
Appendix A. Background: Classification of Connectivity Disruptions
Connectivity disruptions can occur in many different situations.
They can be due to wireless interference, movement out of a wireless
coverage area, switching between access networks, or simply due to
unplugging an Ethernet cable. Depending on the situation in which
they occur, the implications of connectivity disruptions are
different and must be handled appropriately. This section attempts
to classify different types of connectivity disruptions and discusses
their implications and impact on TCP.
Two main properties of connectivity disruptions affect how TCP reacts
to them: their duration and whether the path characteristics have
Schuetz, et al. Expires September 6, 2007 [Page 18]
Internet-Draft TCP Response to Connectivity Indications March 2007
significantly changed after they end. This document distinguishes
between "short" and "long" disruptions and "changed" and "unchanged"
path characteristics. Note that these two categories are orthogonal
to each other, i.e., four types of connectivity disruptions exist.
Connectivity disruptions are "short" for a given TCP connection, if
connectivity returns before the RTO fires for the first time, i.e.,
when TCP is still in steady-state. In this case, standard TCP
recovers lost data segments through Fast Retransmit and lost ACKs
through successfully delivered later ACKs. Appendix A.1 briefly
describes this case.
Connectivity disruptions are "long" for a given TCP connection, if
the RTO fires at least once before connectivity returns, i.e., when
TCP is in exponential back-off. In this case, TCP can be inefficient
in its retransmission scheme, as described in Appendix A.2.
Whether or not path characteristics change when connectivity returns
is a second important factor for TCP's retransmission scheme.
Standard TCP implicitly assumes that path characteristics remain
unchanged across short disruptions by performing Fast Retransmit
using the path parameters collected before the disruption. For long
disruptions, standard TCP is more conservative and performs slow-
start, re-probing the path characteristics from scratch. However,
the standard behavior can be inefficient due to when it is initiated.
These implicit assumptions can cause standard TCP to misbehave or
perform inefficiently in some scenarios. Figure 4 illustrates the
standard TCP behavior.
+-----------------------+-----------------------+
Short | Fast Retransmit using | Fast Retransmit using |
Duration | currently collected | currently collected |
< RTO | path characteristics | path characteristics |
+-----------------------+-----------------------+
Long | | |
Duration | Slow-start | Slow-start |
>= RTO | | |
+-----------------------+-----------------------+
Unchanged Path Changed Path
Characteristics Characteristics
Figure 4: Standard TCP behavior.
Schuetz, et al. Expires September 6, 2007 [Page 19]
Internet-Draft TCP Response to Connectivity Indications March 2007
A.1. Short Connectivity Disruptions
One common cause of short connectivity disruptions that result in a
change of the end-to-end path characteristics is transparent network
layer mobility, via protocols such as Mobile IP, NEMO, or HIP. These
protocols generally hide mobility events from the transport layer,
but cannot mask the resulting changes to the end-to-end path that
established TCP connections transmit over.
Consider a Mobile IP scenario as shown in Figure 5. At time T, a
mobile node MN attaches to access network Net-1, connected to the
Internet through access router AR-1 and has the care-of address
<Net-1, MN>. It establishes a TCP connection to the correspondent
node CN. While MN attaches to AR-1, packets between CN and <Net-1,
MN> follow PATH-1 (via Cloud-1 and AR-1). Assume that at some time
T+1, MN moves and then attaches to Net-2, which is reachable through
AR-2 with the care-of address <Net-2, MN>. While MN attaches to
AR-2, all packets between CN and <Net-2, MN> follow PATH-2 (through
Cloud-2 and AR-2).
<---------PATH-1---------->
/---------\ +------+
| | | | Net-1
+---+ Cloud-1 +---+ AR-1 +-----> MN (time=T)
| | | | |
| \----+----/ +---+--+ |
| | |
CN <------+ | PATH-3 |
| | |
| /----V----\ +-------+ V
| | | | |
+---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1)
| | | | Net-2
\---------/ +-------+
<--------PATH-2----------->
Figure 5: Mobility example.
During a transient disconnected period, MN may have disconnected from
Net-1 and not yet attached to Net-2. Consequently, AR-1 may not be
able to deliver packets to MN. This could result in a burst of
packet losses. Several approaches for "fast" or "seamless" handovers
exist that involve adding machinery to the ARs to buffer and redirect
packets originally sent to Net-1 towards Net-2, rather than dropping
them (e.g., [KOODLI]).
Schuetz, et al. Expires September 6, 2007 [Page 20]
Internet-Draft TCP Response to Connectivity Indications March 2007
As long as MN remains in Net-1, standard congestion control
algorithms [RFC2581] are sufficient. However, once MN moves from
Net-1 to Net-2, two different scenarios are possible depending on
network topology:
o In the first scenario, with standard Mobile IPv4, all packets
destined to <Net-1, MN> are dropped by AR-1 once MN has moved.
Since the latency involved in establishing a new tunnel to the HA
is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly
an entire window's worth of data and ACKs will be dropped by AR-1.
Because of this burst loss, CN and MN are likely to incur
expensive retransmission timeouts.
o In the second scenario, with a fast handover mechanism in place,
losses are masked through buffering and tunneling between routers
AR-1 and AR-2. The exact sequence of buffering and forwarding
between the ARs is not guaranteed to occur in a manner consistent
with the available bandwidth of PATH-3 or conformant to TCP's
clocking expectations. This can cause TCP's behavior over PATH-2
to be based on the unrelated properties of PATH-1 and PATH-3.
After attaching to Net-2, reception of stale ACKs (for data sent on
PATH-1) will cause MN to incorrectly inflate its congestion window.
These stale ACKs do not provide any indication of the congestion
along PATH-2. CN's congestion window becomes similarly inflated by
ACKs that MN sends for data segments redirected over PATH-3. If the
congestion windows from PATH-1 are already too big for PATH-2, this
can overload Net-2 or PATH-2, causing packet loss and timeouts.
On the other hand, if the available bandwidth along PATH-2 is greater
than along PATH-1, and if the sender is in congestion avoidance, it
will need potentially many RTTs before utilizing the available path
capacity. This is due to relatively slow bandwidth increase during
congestion avoidance caused by a stale SS_THRESH. (See [EDDY] for
details.)
A.2. Long Connectivity Disruptions
For long disruptions, standard TCP performs slow-start after
connectivity returns, because the retransmission timeout (RTO) has
expired. This conservative strategy avoids overloading the new path.
However, TCP's general exponential back-off retransmission strategy
can time these slow-starts such that performance decreases.
When a long connectivity disruption occurs along the path between a
host and its peer while the host is transmitting data, it stops
receiving ACKs. After the RTO expires, the host attempts to
retransmit the first unacknowledged segment. TCP implementations
Schuetz, et al. Expires September 6, 2007 [Page 21]
Internet-Draft TCP Response to Connectivity Indications March 2007
that follow the recommended RTO management proposed in [RFC2988]
double the RTO after each retransmission attempt until it exceeds 60
seconds. This scheme causes a host to attempt to retransmit across
established connections roughly once a minute. (More frequently
during the first minute or two of the connectivity disruption, while
the RTO is still being backed off.)
When the long connectivity disruption ends, standard TCP
implementations still wait until the RTO expires before attempting
retransmission. Figure 6 illustrates this behavior. Depending on
when connectivity becomes available again, this can waste up to a
minute of connectivity for TCPs that implement the recommended RTO
management described in [RFC2988]. For TCP implementations that do
not implement [RFC2988], even longer connectivity periods may be
wasted. For example, Linux uses 120 seconds as the maximum RTO by
default.
Sequence
number X = Successfully transmitted segment
^ O = Lost segment
| : : : X
| : : :X
| OO O O O O : X
| X: : :
| X : :<------------>:
| X : : Wasted :
| X : : connection :
|X : : time :
+-----:---------------------:--------------:-------->
: : : Time
Connectivity Connectivity TCP
gone back retransmit
Figure 6: Standard TCP behavior in the presence of disrupted
connectivity.
This retransmission behavior is not efficient, especially in
scenarios where connectivity periods are short and connectivity
disruptions are frequent [OTT]. Experiments show that TCP
performance across a path with frequent disruptions is significantly
worse, compared to a similar path without disruptions [SCHUETZ].
In the ideal case, TCP would attempt a retransmission as soon as
connectivity to its peer was re-established. Figure 7 illustrates
the ideal behavior.
Schuetz, et al. Expires September 6, 2007 [Page 22]
Internet-Draft TCP Response to Connectivity Indications March 2007
Sequence
number X = Successfully transmitted segment
^ O = Lost segment
| : : X :
| : :X :
| OO O O O O X :
| X: : :
| X : :<------------>:
| X : : Efficiency :
| X : : improvement :
|X : : :
+-----:---------------------:--------------:-------->
: : : Time
Connectivity Connectivity Next
gone back = immediate scheduled
TCP retransmit retransmit
Figure 7: Ideal TCP behavior in the presence of disrupted
connectivity
The ideal behavior is difficult to achieve for arbitrary connectivity
disruptions. One obviously problematic approach would use higher-
frequency retransmission attempts to enable earlier detection of
whether connectivity has returned. This can generate significant
amounts of extra traffic. Other proposals attempt to trigger faster
retransmissions by retransmitting buffered or newly-crafted segments
from inside the network
[SCOTT][I-D.dawkins-trigtran-linkup][DUKE][RFC3819].
Note that scenarios exist where path characteristics remain unchanged
after long connectivity disruptions. In this case, even an
intelligently scheduled slow-start is inefficient, because TCP could
safely resume transmitting at the old rate instead of slow-starting.
Although originally developed to avoid line-rate bursts, techniques
for the well-known "slow-start after idle" case
[I-D.ietf-tcpimpl-restart] may be useful to further improve
performance after a disruption ends in such a scenario. This
document does not currently describe this additional optimization,
and an open question remains on how unchanged path characteristics
after long connectivity disruptions could be validated by an end
host.
Schuetz, et al. Expires September 6, 2007 [Page 23]
Internet-Draft TCP Response to Connectivity Indications March 2007
Appendix B. Document Revision History
+----------+--------------------------------------------------------+
| Revision | Comments |
+----------+--------------------------------------------------------+
| 00 | Initial version. This document is a merge of and |
| | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and |
| | [I-D.swami-tcp-lmdr]. |
| 01 | Major revision of the description of the |
| | connectivity-change indication TCP option and its |
| | processing in Section 4. Other formatting changes to |
| | the document include moving some background material |
| | to the appendix. |
+----------+--------------------------------------------------------+
Authors' Addresses
Simon Schuetz
NEC Network Laboratories
Kurfuerstenanlage 36
Heidelberg 69115
Germany
Phone: +49 6221 4342 165
Fax: +49 6221 4342 155
Email: simon.schuetz@netlab.nec.de
URI: http://www.netlab.nec.de/
Lars Eggert
Nokia Research Center
P.O. Box 407
Nokia Group 00045
Finland
Phone: +358 50 48 24461
Email: lars.eggert@nokia.com
URI: http://research.nokia.com/people/lars_eggert/
Schuetz, et al. Expires September 6, 2007 [Page 24]
Internet-Draft TCP Response to Connectivity Indications March 2007
Wesley M. Eddy
Verizon Federal Network Systems
NASA Glenn Research Center
21000 Brookpark Road, MS 54-5
Cleveland, OH 44135
USA
Email: weddy@grc.nasa.gov
Yogesh Prem Swami
Nokia Research Center, Dallas
955 Page Mill Road
Palo Alto, California 94304
USA
Phone: +1 972 374 0669
Email: yogesh.swami@nokia.com
Khiem Le
Nokia Research Center, Dallas
6000 Connection Drive
Irving, TX 75603
USA
Phone: +1 972 342 3502
Email: khiem.le@nokia.com
Schuetz, et al. Expires September 6, 2007 [Page 25]
Internet-Draft TCP Response to Connectivity Indications March 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Schuetz, et al. Expires September 6, 2007 [Page 26]