Network Working Group S. Schuetz
Internet-Draft L. Eggert
Expires: December 2, 2006 NEC
W. Eddy
Verizon
Y. Swami
K. Le
Nokia
May 31, 2006
TCP Response to Lower-Layer Connectivity-Change Indications
draft-schuetz-tcpm-tcp-rlci-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
This document may not be modified, and derivative works of it may not
be created, except to publish it as an RFC and to translate it into
languages other than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 2, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
Schuetz, et al. Expires December 2, 2006 [Page 1]
Internet-Draft TCP Response to Connectivity Indications May 2006
When connectivity characteristics between two hosts change abruptly,
TCP can experience significant delays before resuming transmission in
an efficient manner or TCP can behave unfairly to competing traffic.
This document describes TCP extensions that improve transmission
behavior in response to advisory, lower-layer connectivity-change
indications. The proposed TCP extensions modify the local behavior
of TCP and introduce a new TCP option to signal local connectivity-
change indications to remote peers. Performance gains result from a
more efficient transmission behavior and are not due to an increased
aggressiveness.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4
3. Background: Classification of Connectivity Disruptions . . . . 5
3.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 6
3.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 8
4. Connectivity-Change Indications . . . . . . . . . . . . . . . 10
5. TCP Response to Connectivity-Change Indications . . . . . . . 11
5.1. Connectivity-Change Indication TCP Option . . . . . . . . 12
5.2. Re-Probing Path Characteristics . . . . . . . . . . . . . 14
5.3. Speculative Retransmission . . . . . . . . . . . . . . . . 15
6. Security Considerations . . . . . . . . . . . . . . . . . . . 15
7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
10.1. Normative References . . . . . . . . . . . . . . . . . . . 17
10.2. Informative References . . . . . . . . . . . . . . . . . . 17
Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A. Document Revision History . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
Intellectual Property and Copyright Statements . . . . . . . . . . 23
Schuetz, et al. Expires December 2, 2006 [Page 2]
Internet-Draft TCP Response to Connectivity Indications May 2006
1. Introduction
Several current components of Transmission Control Protocol (TCP)
[RFC0793] assume that end-to-end paths between hosts are relatively
stable over the lifetime of a connection. Although the TCP
congestion control algorithms [RFC2581] adapt to changes in path
connectivity characteristics between two hosts over time, they cannot
adapt well if significant changes occur on time-scales of a few
round-trip times or less. This is due to the granularity of TCP's
sampling mechanisms. Significant changes to path connectivity
include loss or reestablishment of connectivity, and drastic, abrupt
changes to the round-trip time (RTT) or available bandwidth.
Connectivity changes that occur on short time-scales are becoming
more common, due to host mobility or intermittent network attachment.
This document describes a set of complementary TCP extensions that
improve behavior when path characteristics change on short time-
scales. TCP implementations that support the proposed extensions
respond to receiving generic, technology-independent, per-connection
"path characteristics have changed" (or short: "connectivity-change")
indications from lower layers. A connectivity-change indication
signals that the connectivity characteristics of the end-to-end path
between the local node and its peer have changed in an undefined way.
The response mechanisms proposed for TCP act on this information in a
conservative fashion. The specific response depends on the state of
a connection.
It is important to note that TCP and other transport protocols
already react to information and signals from lower layers; the
proposed connectivity-change indications thus extend an established
interface between layers in the protocol stack. TCP measures the
end-to-end path to implicitly derive network-layer information. TCP
also directly reacts to network-layer signals delivered via ICMP, for
example, "Port Unreachable" or the now-deprecated "Source Quench"
[RFC1122]. Explicit Congestion Notification (ECN) [RFC3168] and
Quick-Start [I-D.ietf-tsvwg-quickstart] are other sources of network-
layer information for which response mechanisms for TCP have been
proposed. Connectivity-change indications are yet another source of
lower-layer information that TCP can use to improve its operation.
A second important point to note is that the proposed TCP response
mechanisms to connectivity-change indications are purely optional
efficiency improvements. In the absence of connectivity-change
indications, a TCP that implements the proposed changes behaves
identical to an unmodified TCP. When lower layers provide
connectivity-change indications that trigger the proposed
enhancements, they enhance TCP operation based on the explicit lower-
layer information that is signaled. The proposed response mechanisms
Schuetz, et al. Expires December 2, 2006 [Page 3]
Internet-Draft TCP Response to Connectivity Indications May 2006
do not increase the aggressiveness of TCP.
Note that the IAB has recently described architectural issues of
"link indications" [I-D.iab-link-indications]. The authors feel that
this term is not quite accurate in this environment, because
transport mechanisms should remain link-technology-agnostic.
However, transport protocols have always acted on network-layer
information and signals, such as measured path characteristics or
ICMP-signaled conditions. Because of the growing proliferation of
shim layers between the traditional network and transport layers,
this document uses the term "lower-layer indication" to remain
independent of specific network or shim layers.
Note that it is currently an open question as to whether additional
lower-layer indications can provide further information to transport
protocols. Also, this document focuses on response mechanisms for
TCP only, although other transport protocols may benefit from similar
response mechanisms that react to these indications.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Motivation and Overview
Several proposed network layer extensions support host mobility,
including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP
[I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols
from mobility events and enable them to sustain established
connections across mobility events. However, the path
characteristics that established connections experience after a
mobility event may have changed drastically and on short time-scales.
Congestion control, RTT and path-MTU state gathered over an old path
before the move generally have no meaning when transmitting along a
new path.
TCP already forces a slow-start restart in some cases where the
network state becomes unknown, such as after an idle period or heavy
losses. One mechanism proposed in this document introduces a similar
slow-start restart in response to connectivity-change indications
that are received while a connection is in steady-state. Note that
this behavior is more conservative than the standard TCP response;
any performance gains with the proposed mechanisms are due to
avoiding to overload the new path.
A second proposed extension improves TCP operation in the presence of
temporary connectivity disruptions. These disruptions can occur
Schuetz, et al. Expires December 2, 2006 [Page 4]
Internet-Draft TCP Response to Connectivity Indications May 2006
independently of mobility events and may, for example, be due to
insufficient wireless access coverage or nomadic computer use.
Connectivity disruptions can severely decrease TCP performance. The
main reason for this decrease is TCP's retransmission behavior after
a connectivity disruption [SCHUETZ-CCR], i.e., periodic
retransmission attempts in exponentially increasing intervals, which
can unnecessarily delay retransmissions after connectivity returns.
In the extreme case, TCP connections can even abort, if the
disruption is longer than the TCP "user timeout." (Connection aborts
are out of scope for this document but can be prevented by the TCP
User Timeout Option [I-D.ietf-tcpm-tcp-uto].)
The proposed response mechanism is also executed when receiving a
connectivity-change indication, but is chosen when a connection is
stalled in exponential back-off. It improves TCP retransmission
behavior after connectivity is restored through an immediate
speculative retransmission attempt [anchor3]. Similar to the first
extension, this modification increases TCP performance through a more
intelligent transmission behavior that uses periods of connectivity
more efficiently. It does not cause significant amounts of
additional traffic and does not change TCP's congestion control
algorithms.
Finally, this draft proposes a third mechanism, which is a new TCP
option that signals connectivity-change indications received or
detected by a host to its remote peers in open TCP connections. This
is useful, because connectivity indications typically require
appropriate responses at both peers, but may only be received or
detected by one peer. Response to a connectivity-change indication
is independent of its source (locally notified or remotely signaled)
and depends only on the specific indication and the state of the
connection for which it was received.
3. Background: Classification of Connectivity Disruptions
Connectivity disruptions occur in many different situations. They
can be due to wireless interference, movement out of a wireless
coverage area, switching between access networks, or simply due to
unplugging an Ethernet cable. Depending on the situation in which
they occur, the implications of connectivity disruptions are
different and must be handled appropriately. This section attempts
to classify different types of connectivity disruptions and discusses
their implications and impact on TCP.
Two main properties of connectivity disruptions affect how TCP reacts
on them: their duration and whether the path characteristics have
significantly changed after they end. This document distinguishes
Schuetz, et al. Expires December 2, 2006 [Page 5]
Internet-Draft TCP Response to Connectivity Indications May 2006
between "short" and "long" disruptions and "changed" and "unchanged"
path characteristics. Note that these two categories are orthogonal
to each other, i.e., all four combinations exist.
Connectivity disruptions are "short" for a given TCP connection, if
connectivity returns before the RTO fires for the first time. In
this case, standard TCP recovers lost data segments through Fast
Retransmit and lost ACKs through successfully delivered later ACKs.
Section 3.1 briefly describes this case.
Connectivity disruptions are "long" for a given TCP connection, if
the RTO fires at least once before connectivity returns. In this
case, TCP can be inefficient in its retransmission scheme, as
described in Section 3.2.
Whether or not path characteristics change when connectivity returns
is a second important factor for TCP's retransmission scheme.
Standard TCP implicitly assumes that path characteristics remain
unchanged for short disruptions by performing Fast Retransmit based
on path parameters collected before the disruption. For long
disruptions, standard TCP is more conservative and performs slow-
start, re-probing the path characteristics from scratch. However,
the standard behavior can be inefficient.
These implicit assumptions can cause standard TCP to misbehave or
perform inefficiently in some scenarios. Figure 1 illustrates the
standard TCP behavior.
+----------------------+----------------------+
Short | Fast Retransmit | Fast Retransmit |
Duration | using collected path | using collected path |
< RTO | characteristics | characteristics |
+----------------------+----------------------+
Long | | |
Duration | Slow-start | Slow-start |
>= RTO | | |
+----------------------+----------------------+
Unchanged Path Changed Path
Characteristics Characteristics
Figure 1: Standard TCP behavior.
3.1. Short Connectivity Disruptions
One common cause of short connectivity disruptions that result in a
change of the end-to-end path characteristics is transparent network
layer mobility, via protocols such as Mobile IP, NEMO, or HIP.
Schuetz, et al. Expires December 2, 2006 [Page 6]
Internet-Draft TCP Response to Connectivity Indications May 2006
Although changes in the point of network attachment happen
unbeknownst to the transport layer, these events may change many
aspects of the path which established TCP connections base their
behavior upon.
Consider a MobileIP scenario as shown in Figure 2. At time T, a
mobile node MN is attached to access network Net-1, connected to the
Internet through access router AR-1 and has the care-of address
<Net-1, MN>. A TCP connection is established between MN and a
corresponding node CN. While MN is attached to AR-1, packets between
CN and <Net-1, MN> are routed using PATH-1 (via Cloud-1 and AR-1).
Assume that at some time T+1, MN moves and then attaches to Net-2,
which is reachable through AR-2 with the care-of address <Net-2, MN>.
While MN is attached to AR-2, all packets between CN and <Net-2, MN>
are routed using PATH-2 (through Cloud-2 and AR-2).
<---------PATH-1---------->
/---------\ +------+
| | | | Net-1
+---+ Cloud-1 +---+ AR-1 +-----> MN (time=T)
| | | | |
| \----+----/ +---+--+ |
| | |
CN <------+ | PATH-3 |
| | |
| /----V----\ +-------+ V
| | | | |
+---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1)
| | | | Net-2
\---------/ +-------+
<--------PATH-2----------->
Figure 2: Mobility example.
During a transitional disconnected period, MN may be disconnected
from Net-1 and not yet attached to Net-2. Consequently, AR-1 may not
be able to deliver packets to MN. This could result in a burst of
packet losses. There are several suggested means of supporting
"fast" or "seamless" handovers, which involve adding machinery to the
ARs to buffer and redirect packets originally sent to Net-1 towards
Net-2, rather than dropping them (e.g., [KOODLI]).
As long as MN remains in Net-1, standard congestion control
algorithms [RFC2581] are sufficient. But once it moves from Net-1 to
Net-2, two different scenarios are possible depending on network
Schuetz, et al. Expires December 2, 2006 [Page 7]
Internet-Draft TCP Response to Connectivity Indications May 2006
topology:
o In the first scenario, with standard Mobile IPv4, all packets
destined to <Net-1, MN> are dropped by AR-1 once the mobile node
has moved. Since the latency involved in establishing a new
tunnel to the HA is on the order of the RTT (2*RTT in case of
Mobile IPv6), roughly an entire window's worth of data and ACKs
will be dropped by AR-1. Because of this burst loss, the CN and
MN are likely to incur expensive retransmission timeouts.
o In the second scenario, with a fast handover mechanism in place,
losses are suppressed through buffering and tunneling between
routers AR-1 and AR-2. The exact means of buffering and
forwarding between the ARs is not guaranteed to occur in a manner
consistent to the available bandwidth of PATH-3, nor to conform to
TCP's clocking expectations. This can cause TCP's behavior over
PATH-2 to be based on the unrelated properties of PATH-1 and
PATH-3.
After attaching to Net-2, reception of stale ACKs (for data sent on
PATH-1) will cause MN to incorrectly inflate its congestion window.
These stale ACKs do not provide any indication of the congestion
along PATH-2 and should consequently be ignored . CN's congestion
window becomes similarly inflated by ACKs that MN sends for data
segments redirected over PATH-3. If the congestion windows from
PATH-1 are already too big for PATH-2, this can overload Net-2 or
PATH-2, causing packet loss and timeouts.
On the other hand, if the available bandwidth along PATH-2 is greater
than along PATH-1, and if the sender is in congestion avoidance, it
will need potentially many RTTs before reaching a reasonable
throughput. This is due to relatively slow bandwidth increase during
congestion avoidance caused by a stale SS_THRESH. (See [ES05] for
details.)
3.2. Long Connectivity Disruptions
For long disruptions, standard TCP performs slow-start after
connectivity returns, because the retransmission timeout (RTO) has
expired. This is a conservative strategy that avoids overloading the
new path. However, TCP's general exponential back-off retransmission
strategy can time these slow-starts such that performance decreases.
When a long connectivity disruption occurs along the path between a
host and its peer while the host is transmitting data, it stops
receiving ACKs. After the RTO expires, the host attempts to
retransmit the first unacknowledged segment. TCP implementations
that follow the recommended RTO management proposed in [RFC2988]
Schuetz, et al. Expires December 2, 2006 [Page 8]
Internet-Draft TCP Response to Connectivity Indications May 2006
double the RTO after each retransmission attempt until it exceeds 60
seconds. This scheme causes a host to attempt to retransmit across
established connections roughly once a minute. (More frequently
during the first minute or two of the connectivity disruption, while
the RTO is still being backed off.)
When the long connectivity disruption ends, standard TCP
implementations still wait until the RTO expires before attempting
retransmission. Figure 3 illustrates this behavior. Depending on
when connectivity becomes available again, this can waste up to a
minute of connection time for TCPs that implement the recommended RTO
management described in [RFC2988]. For TCP implementations that do
not implement [RFC2988], even longer connection times may be lost.
For example, Linux uses 120 seconds as the maximum RTO by default.
Sequence
number X = Successfully transmitted segment
^ O = Lost segment
| : : : X
| : : :X
| OO O O O O : X
| X: : :
| X : :<------------>:
| X : : Wasted :
| X : : connection :
|X : : time :
+-----:---------------------:--------------:-------->
: : : Time
Connectivity Connectivity TCP
gone back retransmit
Figure 3: Standard TCP behavior in the presence of disrupted
connectivity.
This retransmission behavior is not efficient, especially in
scenarios where connected periods are short and connectivity
disruptions are frequent [DRIVE-THRU]. Experiments show that TCP
performance across a path with frequent disruptions is significantly
worse, compared to a similar path without disruptions [SCHUETZ-CCR].
In the ideal case, TCP would attempt a retransmission as soon as
connectivity to its peer was re-established. Figure 4 illustrates
the ideal behavior.
Schuetz, et al. Expires December 2, 2006 [Page 9]
Internet-Draft TCP Response to Connectivity Indications May 2006
Sequence
number X = Successfully transmitted segment
^ O = Lost segment
| : : X :
| : :X :
| OO O O O O X :
| X: : :
| X : :<------------>:
| X : : Efficiency :
| X : : improvement :
|X : : :
+-----:---------------------:--------------:-------->
: : : Time
Connectivity Connectivity Next
gone back = immediate scheduled
TCP retransmit retransmit
Figure 4: Ideal TCP behavior in the presence of disrupted
connectivity
The ideal behavior is difficult to achieve for arbitrary connectivity
disruptions. One obviously problematic approach would use higher-
frequency retransmission attempts to enable earlier detection of
whether connectivity has returned. This can generate significant
amounts of extra traffic. Other proposals attempt to trigger faster
retransmissions by retransmitting buffered or newly-crafted segments
from inside the network [SCOTT][I-D.dawkins-trigtran-
linkup][DUKEHEND][RFC3819].
Note that scenarios exist where path characteristics remain unchanged
after long connectivity disruptions. In this case, even an
intelligently scheduled slow-start is inefficient, because TCP could
safely resume transmitting at the old rate instead of slow-starting.
Although originally developed to avoid line-rate bursts, techniques
for the well-known "slow-start after idle" case [I-D.ietf-tcpimpl-
restart] may be useful to further improve performance after a
disruption ends. This document does not currently describe this
additional optimization.
4. Connectivity-Change Indications
The focus of this document is on specifying TCP response mechanisms
to lower-layer "path characteristics have changed" indications. This
section briefly describes how different network- and shim-layer
mechanisms underneath the transport layer can provide these
"connectivity-change" indications to TCP. This description is
included for clarification only; the details of providing
Schuetz, et al. Expires December 2, 2006 [Page 10]
Internet-Draft TCP Response to Connectivity Indications May 2006
connectivity indications is out of scope of this document.
Connectivity-change indications may be generated after lower layers
detect a connectivity-change event, for example, because:
o the IP address of the outbound interface of a connection has
changed, e.g., due to DHCP [RFC2131] or IPv6 router advertisements
[RFC2460]
o link-layer connectivity at the outbound interface of a connection
has changed, e.g., link-layer "link up" event
o the outbound interface of a connection has changed, due to routing
changes or link-layer connectivity changes at other interfaces
(including tunnel establishments or teardowns, e.g., in response
to IKE events [RFC4306])
o a MobileIP binding update has completed [RFC3775]
o a HIP readdressing update has completed [I-D.ietf-hip-mm]
o a path-change signal from the network has arrived (possible in
theory, depends on network capabilities)
o other notifications as defined by the IETF's Detecting Network
Attachment (DNA) working group [I-D.ietf-dna-link-information]
5. TCP Response to Connectivity-Change Indications
A TCP connection can receive connectivity-change indications either
from its local stack or through a new "connectivity-change TCP
option" from its peer, as described in Section 5.1. In either case,
TCP implementations that implement the proposed changes re-probe path
characteristics or perform a speculative retransmission, depending on
whether the connection is currently stalled in exponential back-off
or not. A connection is "stalled in exponential back-off", if there
is at least one unrecovered RTO, i.e. a segment was already
retransmitted due to an RTO but still is not ACKed yet.
TCP implementations that implement the proposed changes MUST maintain
three new variables per connection: MY_CCI_COUNT, REMOTE_CCI_COUNT
and CCI_STATE. The variables MY_CCI_COUNT and REMOTE_CCI_COUNT count
locally and remotely received connectivity-change indications,
respectively. The variable CCI_STATE stores the current state of the
connectivity-change indication processing. CCI_STATE can have one of
the following values:
Schuetz, et al. Expires December 2, 2006 [Page 11]
Internet-Draft TCP Response to Connectivity Indications May 2006
o CCI_IDLE: The host is currently not processing any connectivity-
change indications.
o CCI_INITIATOR: The host is currently processing a connectivity-
change indication received from the local stack and propagated the
indication to its peer through a connectivity-change TCP option.
o CCI_RESPONDER: The host is currently processing a connectivity-
change indication received from its peer via a connectivity-change
TCP option.
In the following, this document first introduces the operation of the
new connectivity-change TCP option in Section 5.1, and afterwards
describes the two mechanisms to improve TCP performance in response
to connectivity-change events - namely re-probing path
characteristics and speculative retransmission - in Section 5.2 and
Section 5.3.
5.1. Connectivity-Change Indication TCP Option
Connectivity-change indications are generally asymmetric, i.e., they
may occur on one peer host but not the other. The basic idea behind
the connectivity-change TCP option is to signal connectivity-change
indications that the local stack has received to the peer, in order
to allow it to respond appropriately. Figure 5 shows the option.
However, if there is strong evidence that a connectivity-change
indication received from the local stack is symmetric, i.e., it
occurs on both communicating peers, the host MAY decide not to signal
the connectivity-change indication to the remote peer. In this case,
the signaling overhead can be avoided, because the remote peer will
already react to the connectivity-change indication that it receives
from its local stack. For instance, when a HIP identifier becomes
rebound to a new locator, both local and remote peers can be
simultaneously notified about the connectivity-change by their local
stacks, when the HIP UPDATE procedure completes [I-D.ietf-hip-mm].
1 1 2 2
0 8 6 8 1 4
+----------------+----------------+-----+------+------+
| KIND | LENGTH | RES | CNTR | ECNT |
+----------------+----------------+-----+------+------+
Figure 5: Format of the connectivity-change indication TCP option.
KIND: (8 Bits) TCP Option Type. Value set to 25 for experimental
purposes.
Schuetz, et al. Expires December 2, 2006 [Page 12]
Internet-Draft TCP Response to Connectivity Indications May 2006
LENGTH: (8 Bits) TCP Option Length. Value = 3.
RES: (2 Bits) Reserved bit. Sender SHOULD set the value to zero.
Receiver MUST ignore these fields.
CNTR: (3 Bits) The local connectivity-change indication counter
value of the host sending this option. This value is decremented
once for every connectivity-change indication that the local stack
delivers to the connection.
ECNT: (3 Bits) The echoed value of CNTR. On reception of a
connectivity-change indication TCP option, a host copies the
received CNTR value to the ECNT field of its response.
The connectivity-change TCP option contains a counter (CNTR) that
represents the number of times each side has received connectivity-
change indications from its local stack. At the beginning of a
connection, both endpoints use this option in the SYN and SYN-ACK
segments, with an initial counter value of 7, to advertise support
for the option. A host MUST NOT place this option in a SYN-ACK
unless it was present on the received SYN. After the SYN exchange,
hosts SHOULD NOT send this option until there is a connectivity-
change indication. After connection setup, the option is only
generated when a connection receives a connectivity-change indication
from its local stack, or in response to a received connectivity-
change TCP option from the peer. A host MUST NOT send the option
during a connection unless it was advertised by both sides during the
SYN handshake.
When a host receives a connectivity-change TCP option, it SHOULD
respond to it as described in Section 5.2 and Section 5.3 only if
CNTR != REMOTE_CCI_COUNT, i.e. the peer signals a new instance of a
connectivity-change that it has not previously signaled. The host
SHOULD NOT respond to the reception of a connectivity-change TCP
option if CNTR = REMOTE_CCI_COUNT, because the option duplicates a
previous connectivity-change indication.
At the beginning of a connection, CCI_STATE MUST be set to CCI_IDLE.
The option SHOULD be included in all outgoing ACKs or segments if
CCI_STATE != CCI_IDLE and SHOULD NOT be included in any outgoing ACK
or segment if CCI_STATE = CCI_IDLE.
When sending the connectivity-change TCP option, CNTR MUST be set to
current MY_CCI_COUNT and ECNT MUST be set to current
REMOTE_CCI_COUNT.
When a connection receives a connectivity-change indication from its
local stack and decides to signal the local indication to the remote
Schuetz, et al. Expires December 2, 2006 [Page 13]
Internet-Draft TCP Response to Connectivity Indications May 2006
peer, it decrements its MY_CCI_COUNTER, sets CCI_STATE to
CCI_INITIATOR and consequently sends a connectivity-change TCP option
in every subsequent ACK or data segment until CCI_STATE = CCI_IDLE.
It resets CCI_STATE from CCI_INITIATOR to CCI_IDLE when it sees its
current MY_CCI_COUNTER value echoed back as ECNT in a connectivity-
change TCP option received from its peer.
NOTE: As discussed before, a host may under certain circumstances
decide not to signal a local connectivity-change indication to the
remote peer. In this case, MY_CCI_COUNTER and CCI_STATE MUST NOT be
altered.
When a host receives a connectivity-change TCP option from its peer,
it compares the received CNTR and the local REMOTE_CCI_COUNT. If
they match, no further action is required. Otherwise, it MUST update
REMOTE_CCI_COUNT to CNTR. It also MUST update CCI_STATE to
CCI_RESPONDER unless
o CCI_STATE is CCI_INITIATOR and
o it has the higher initial sequence number of the two communicating
hosts.
CCI_STATE is reset from CCI_RESPONDER to CCI_IDLE when a host
receives an ACK or segment from its peer that does not contain the
connectivity-change TCP option.
NOTE: The transition from CCI_STATE CCI_INITIATOR to CCI_RESPONDER is
only allowed if the host has the lower initial sequence number. This
is to prevent an infinite signaling loop where both hosts are in the
CCI_RESPONDER state. Otherwise, if the two peers simultaneously
receive connectivity-change indications from their local stacks and
send out connectivity-change TCP options, both peers would set
CCI_STATE to CCI_RESPONDER and include the option in all subsequent
ACKs and segments. Therefore, none of the peers will reset CCI_STATE
from CCI_RESPONDER to CCI_IDLE, as this transition is only performed
when a host receives an ACK or segment that does not contain the
connectivity-change TCP option.
5.2. Re-Probing Path Characteristics
When a TCP connection receives a connectivity-change indication and
is not currently stalled, it MUST re-probe the path characteristics
to prevent causing congestion along the potentially new path and to
quickly probe the path's available capacity. In principle, this
occurs similar to the initial slow-start: The sender MUST NOT
transmit more than the default initial window of data along the new
path, in order to avoid over-congesting it, and the slow-start
Schuetz, et al. Expires December 2, 2006 [Page 14]
Internet-Draft TCP Response to Connectivity Indications May 2006
threshold (SS_THRESH) SHOULD be set to the initial value as with a
new connection to allow for rapid probing of available capacity. In
addition, it MUST reset round-trip time measurement (RTTM) and the
RTO timer. In case Path MTU Discovery (PMTUD) is activated, PMTUD
state SHOULD also be reset [RFC1191][RFC1981].
One difference to slow-start is that after a connectivity-change
indication, the connection may have segments in flight towards the
destination along a previous path. Therefore, after a connectivity-
change indication, congestion control MUST ignore any stale ACKs and
MUST update the congestion window solely based on ACKs for data sent
on the new path.
In detail, when a connectivity-change indication is received, it MAY
send INIT_WINDOW worth of data along the changed path and MUST reset
the congestion control state, RTTM state, and RTO timer as if this
were a new connection [RFC2581][RFC2988]. Each ACK that is received
while CCI_STATE is not CCI_IDLE SHOULD be treated as a stale ACK.
For each stale ACK received, a host MUST NOT adjust the congestion
window and MUST NOT send any new data into the network. This
behavior SHOULD continue until CCI_STATE is CCI_IDLE again or there
is a timeout. Once CCI_STATE is set to CCI_IDLE, the sender should
consider any un-ACK'ed segments below the highest received ACK as
lost and discount them from the segments in flight. The sender MUST
use slow-start based loss recovery for these segments.
5.3. Speculative Retransmission
The basic idea behind the speculative retransmission is to allow TCP
to resume stalled connections as soon as it receives an indication
that connectivity to previously unreachable peers may have returned.
When a TCP connection receives a connectivity-change indication -
either from the local stack or in a connectivity-change TCP option
from the peer - and is currently stalled, it MUST immediately
initiate the standard retransmission procedure, just as if the RTO
for the connection had expired.
In addition, conforming TCP implementations SHOULD send at least one
segment to the peer. This segment MUST contain the connectivity-
change TCP option to notify the peer and may either be a queued data
retransmission or a pure ACK, if the connection has no data awaiting
retransmission.
6. Security Considerations
Schuetz, et al. Expires December 2, 2006 [Page 15]
Internet-Draft TCP Response to Connectivity Indications May 2006
The only foreseen security considerations with the techniques
presented in this document, result from either an attacker's ability
to spoof valid TCP segments with options that seemingly indicate
connectivity changes, or an attacker's ability to generate bogus
connectivity change indications locally. An attacker might produce a
stream of such false indicators that could keep a connection in slow-
start at the initial window. One possible defense against this type
of attack is to rate-limit the response to connectivity indicators
(whether local or remote). This is also probably less serious than
other attacks such an empowered adversary could perform, like
reseting the connection or injecting data. A similar effect could be
achieved without the new option by forging duplicate ACKs that would
keep a sender in loss recovery. If both sets of IP addresses, port
numbers, and sequence numbers are guessable for a connection, then
the connection should use an approved means (such as IPsec)
[I-D.ietf-tcpm-tcp-antispoof] for protection against spoofed
segments.
7. Conclusion
When connectivity characteristics between two hosts change abruptly,
TCP can experience significant delays before resuming transmission in
an efficient manner or TCP can behave unfairly to competing traffic.
This document describes TCP extensions that improve transmission
behavior in response to advisory, lower-layer connectivity-change
indications. The proposed TCP extensions modify the local behavior
of TCP and introduce a new TCP option to signal local connectivity-
change indications to remote peers.
8. IANA Considerations
This section is to be interpreted according to [RFC2434].
This document does not define any new namespaces. It uses an 8-bit
TCP option number maintained by IANA at
http://www.iana.org/assignments/tcp-parameters.
9. Acknowledgments
This draft combines and obsoletes [I-D.swami-tcp-lmdr] and
[I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to
thank Mark Allman, Marcus Brunner, Shashikant Maheshwari, Kacheong
Poon, Juergen Quittek, Stefan Schmid and Joe Touch for their comments
and suggestions on the two previous drafts.
Schuetz, et al. Expires December 2, 2006 [Page 16]
Internet-Draft TCP Response to Connectivity Indications May 2006
Lars Eggert and Simon Schuetz are partly funded by Ambient Networks,
a research project supported by the European Commission under its
Sixth Framework Program. The views and conclusions contained herein
are those of the authors and should not be interpreted as necessarily
representing the official policies or endorsements, either expressed
or implied, of the Ambient Networks project or the European
Commission.
Wesley Eddy's work on this document was performed at NASA's Glenn
Research Center, while in support of the NASA Space Communications
Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future
Communications Study (FCS).
10. References
10.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", BCP 26, RFC 2434,
October 1998.
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999.
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000.
10.2. Informative References
[DRIVE-THRU]
Ott, J. and D. Kutscher, "Drive-Thru Internet: IEEE
802.11b for Automobile Users", Proc. Infocom 2004,
March 2004.
[DUKEHEND]
Duke, M., Henderson, T., and J. Meegan, "Experience with
``Link-UP Notification'' Over a Mobile Satellite Link",
ACM Computer Communication Review, Vol. 34, No. 3,
July 2004.
[ES05] Eddy, W. and Y. Swami, "Adapting End-host Congestion
Schuetz, et al. Expires December 2, 2006 [Page 17]
Internet-Draft TCP Response to Connectivity Indications May 2006
Control for Mobility", NASA Glenn Research Center
Technical Report, CR-2005-213838, July 2005.
[I-D.dawkins-trigtran-linkup]
Dawkins, S., "End-to-end, Implicit 'Link-Up'
Notification", draft-dawkins-trigtran-linkup-01 (work in
progress), October 2003.
[I-D.eggert-tcpm-tcp-retransmit-now]
Eggert, L., "TCP Extensions for Immediate
Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
(work in progress), June 2005.
[I-D.iab-link-indications]
Aboba, B., "Architectural Implications of Link
Indications", draft-iab-link-indications-04 (work in
progress), December 2005.
[I-D.ietf-dna-link-information]
Yegin, A., "Link-layer Event Notifications for Detecting
Network Attachments", draft-ietf-dna-link-information-03
(work in progress), October 2005.
[I-D.ietf-hip-mm]
Nikander, P., "End-Host Mobility and Multihoming with the
Host Identity Protocol", draft-ietf-hip-mm-03 (work in
progress), March 2006.
[I-D.ietf-tcpimpl-restart]
Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
Slow-Start Restart After Idle",
draft-ietf-tcpimpl-restart-00 (work in progress),
March 1998.
[I-D.ietf-tcpm-tcp-antispoof]
Touch, J., "Defending TCP Against Spoofing Attacks",
draft-ietf-tcpm-tcp-antispoof-03 (work in progress),
February 2006.
[I-D.ietf-tcpm-tcp-uto]
Eggert, L. and F. Gont, "TCP User Timeout Option",
draft-ietf-tcpm-tcp-uto-02 (work in progress),
October 2005.
[I-D.ietf-tsvwg-quickstart]
Floyd, S., "Quick-Start for TCP and IP",
draft-ietf-tsvwg-quickstart-02 (work in progress),
March 2006.
Schuetz, et al. Expires December 2, 2006 [Page 18]
Internet-Draft TCP Response to Connectivity Indications May 2006
[I-D.swami-tcp-lmdr]
Swami, Y., "Lightweight Mobility Detection and Response
(LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work
in progress), March 2006.
[KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context
Transfers in Mobile Networks", ACM Computer Communication
Review, Vol. 31, No. 5, October 2001.
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990.
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, August 1996.
[RFC2131] Droms, R., "Dynamic Host Configuration Protocol",
RFC 2131, March 1997.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001.
[RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344,
August 2002.
[RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support
in IPv6", RFC 3775, June 2004.
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
Wood, "Advice for Internet Subnetwork Designers", BCP 89,
RFC 3819, July 2004.
[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
RFC 4306, December 2005.
[SCHUETZ-CCR]
Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
"Protocol Enhancements for Intermittently Connected
Hosts", ACM Computer Communication Review, Vol. 35, No. 3,
July 2005.
Schuetz, et al. Expires December 2, 2006 [Page 19]
Internet-Draft TCP Response to Connectivity Indications May 2006
[SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimisation
for disconnecting networks", ACM Computer Communication
Review, Vol. 33, No. 5, October 2003.
Editorial Comments
[anchor3] LE: The authors have seen the idea of triggering
retransmits based on connectivity events of directly-
connected links attributed to Phil Karn ("kick" operation
in the KAQ9 TCP stack). Pointers to a citable reference
are highly appreciated!
Appendix A. Document Revision History
+----------+--------------------------------------------------------+
| Revision | Comments |
+----------+--------------------------------------------------------+
| 00 | Initial version. This document is a merge of and |
| | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and |
| | [I-D.swami-tcp-lmdr]. |
+----------+--------------------------------------------------------+
Schuetz, et al. Expires December 2, 2006 [Page 20]
Internet-Draft TCP Response to Connectivity Indications May 2006
Authors' Addresses
Simon Schuetz
NEC Network Laboratories
Kurfuerstenanlage 36
Heidelberg 69115
Germany
Phone: +49 6221 4342 165
Fax: +49 6221 4342 155
Email: simon.schuetz@netlab.nec.de
URI: http://www.netlab.nec.de/
Lars Eggert
NEC Network Laboratories
Kurfuerstenanlage 36
Heidelberg 69115
Germany
Phone: +49 6221 4342 143
Fax: +49 6221 4342 155
Email: lars.eggert@netlab.nec.de
URI: http://www.netlab.nec.de/
Wesley M. Eddy
Verizon Federal Network Systems
NASA Glenn Research Center
21000 Brookpark Road, MS 54-5
Cleveland, OH 44135
USA
Email: weddy@grc.nasa.gov
Yogesh Prem Swami
Nokia Research Center, Dallas
6000 Connection Drive
Irving, TX 75603
USA
Phone: +1 972 374 0669
Email: yogesh.swami@nokia.com
Schuetz, et al. Expires December 2, 2006 [Page 21]
Internet-Draft TCP Response to Connectivity Indications May 2006
Khiem Le
Nokia Research Center, Dallas
6000 Connection Drive
Irving, TX 75603
USA
Phone: +1 972 894 4882
Email: khiem.le@nokia.com
Schuetz, et al. Expires December 2, 2006 [Page 22]
Internet-Draft TCP Response to Connectivity Indications May 2006
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Schuetz, et al. Expires December 2, 2006 [Page 23]