TCPM Working Group S. Schuetz
Internet-Draft NEC
Intended status: Experimental N. Koutsianas
Expires: August 25, 2008 L. Eggert
Nokia
W. Eddy
Verizon
Y. Swami
Nokia
K. Le
NSN
February 22, 2008
TCP Response to Lower-Layer Connectivity-Change Indications
draft-schuetz-tcpm-tcp-rlci-03
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
This document may not be modified, and derivative works of it may not
be created, except to publish it as an RFC and to translate it into
languages other than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 25, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2008).
Schuetz, et al. Expires August 25, 2008 [Page 1]
Internet-Draft TCP Response to Connectivity Indications February 2008
Abstract
When the path characteristics between two hosts change abruptly, TCP
can experience significant delays before resuming transmission in an
efficient manner or TCP can behave unfairly to competing traffic.
This document describes TCP extensions that improve transmission
behavior in response to advisory, lower-layer connectivity-change
indications. The proposed TCP extensions modify the local behavior
of TCP and introduce a new TCP option to signal locally received
connectivity-change indications to remote peers. Performance gains
result from a more efficient transmission behavior and there is no
difference in aggressiveness in comparison to a newly-started
connection.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4
4. Connectivity-Change Indications . . . . . . . . . . . . . . . 6
5. TCP Response to Connectivity-Change Indications (CCIs) . . . . 7
5.1. Connectivity-Change Indication (CCI) TCP Option . . . . . 9
5.2. Generation and Processing of Connectivity-Change
Indication TCP Options . . . . . . . . . . . . . . . . . . 11
5.3. Re-Probing Path Characteristics . . . . . . . . . . . . . 15
5.4. Speculative Retransmission . . . . . . . . . . . . . . . . 16
6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1. Triggered Segment Transmission during Steady-State . . . . 17
6.2. Impact of Packet Loss . . . . . . . . . . . . . . . . . . 17
6.3. Use of Limited Transmit with RLCI . . . . . . . . . . . . 18
6.4. Simultaneous Processing of Connectivity-Change
Indications . . . . . . . . . . . . . . . . . . . . . . . 19
7. Security Considerations . . . . . . . . . . . . . . . . . . . 19
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
10.1. Normative References . . . . . . . . . . . . . . . . . . . 20
10.2. Informative References . . . . . . . . . . . . . . . . . . 21
Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A. Background: Classification of Connectivity
Disruptions . . . . . . . . . . . . . . . . . . . . . 23
A.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 25
A.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 27
Appendix B. Document Revision History . . . . . . . . . . . . . . 29
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
Intellectual Property and Copyright Statements . . . . . . . . . . 32
Schuetz, et al. Expires August 25, 2008 [Page 2]
Internet-Draft TCP Response to Connectivity Indications February 2008
1. Introduction
The Transmission Control Protocol (TCP) [RFC0793] generally assumes
that the end-to-end path between two hosts has characteristics that
are relatively stable over the lifetime of a connection. Although
TCP's congestion control algorithms [RFC2581] can adapt to changes to
the path characteristics after several round-trip times, they fail to
support efficient operation in the few round-trip times immediately
after a significant path change. This is due to the granularity of
TCP's sampling mechanisms. Significant changes to path connectivity
include loss or reestablishment of connectivity, and drastic, abrupt
changes in round-trip time (RTT) or available bandwidth.
Connectivity changes that occur on such short time-scales are
becoming more common, due to host mobility or intermittent network
attachment.
This document describes a set of complementary TCP extensions that
improve behavior when transmitting over paths whose characteristics
can change on short time-scales. TCP implementations that support
these extensions respond to receiving generic, link-technology-
independent, per-connection connectivity-change indications from
lower layers. A connectivity-change indication signals that the
characteristics of the end-to-end path between the local node and its
peer have changed in some undefined way. The response mechanisms
proposed for TCP act on this information in a conservative fashion.
The specific response depends on the current state of a connection
when a connectivity-change indication is received.
It is important to note that this addition of response mechanisms to
lower-layer information is following an established precedent. TCP
and other transport protocols already react to information and
signals from lower layers; the proposed connectivity-change
indications thus extend an established interface between layers in
the protocol stack. TCP measures the end-to-end path to implicitly
derive network-layer information. TCP also directly reacts to
network-layer signals delivered via ICMP, for example, "Port
Unreachable" or the now-deprecated "Source Quench" [RFC1122].
Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start
[RFC4782] are other sources of network-layer information for which
response mechanisms for TCP have been defined. Connectivity-change
indications are yet another source of lower-layer information that
TCP can use to improve its operation.
A second important point to note is that the TCP response mechanisms
to connectivity-change indications are purely optional efficiency
improvements. In the absence of connectivity-change indications, a
TCP that implements these changes behaves identically to an
unmodified TCP. When lower layers provide connectivity-change
Schuetz, et al. Expires August 25, 2008 [Page 3]
Internet-Draft TCP Response to Connectivity Indications February 2008
indications that trigger the response mechanisms, they enhance TCP
operation based on the explicit lower-layer information that is
signaled. These response mechanisms do not increase the
aggressiveness of TCP.
Note that the IAB has recently described architectural issues of
"link indications" [RFC4907]. The authors feel that this term is not
quite accurate in this environment, because transport mechanisms
should remain link-technology-agnostic. However, transport protocols
have always acted on network-layer information and signals, such as
measured path characteristics or ICMP-signaled conditions. Because
of the growing proliferation of shim layers between the traditional
network and transport layers, this document uses the term "lower-
layer indication" to remain independent of specific network or shim
layers.
Note that it is currently an open question as to whether additional
lower-layer indications can provide further information to transport
protocols. Also, this document only describes response mechanisms
for TCP, although other transport protocols may benefit from similar
response mechanisms to react to connectivity-change indications.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
The following abbreviations are used throughout the document:
+------+---------------------------------------------------------+
| CCI | Connectivity-Change Indication |
| RLCI | Response to Lower-layer Connectivity-change Indications |
+------+---------------------------------------------------------+
Table 1: Abbreviations
3. Motivation and Overview
Several proposed network-layer extensions support host mobility,
including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP
[I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols
from mobility events and enable them to sustain established
connections across mobility events. However, the path
characteristics that established connections experience after a
mobility event may have changed drastically and on short time-scales.
Schuetz, et al. Expires August 25, 2008 [Page 4]
Internet-Draft TCP Response to Connectivity Indications February 2008
Congestion control, RTT and path-MTU state gathered over an old path
before the move generally have no meaning for the new path. Because
TCP uses stale information when resuming transmission over the new
path, it can be either too aggressive or highly inefficient. Similar
conditions may be found when fail-overs occur for multihomed hosts
through the shim6 protocol. Some background on the types of
scenarios that the technology described in this document is designed
to work within is found in Appendix A.
TCP already forces a slow-start restart in some cases where the
network state becomes unknown, such as after an idle period or heavy
losses. A first part of the response specified in this document
involves a similar return to initial slow-start state in response to
connectivity-change indications that are received while a connection
is transmitting in steady-state. Note that this behavior is more
conservative than the standard TCP response or lack of response.
Some performance gains with the proposed mechanisms are due to either
avoiding overloading the new path, which typically incurs an RTO, or
using slow-start to quickly detect new capacity far above the point
where steady-state had previously been near.
A second response component improves TCP operation in the presence of
temporary connectivity disruptions. These disruptions can occur
independently of mobility events and, for example, may be due to
insufficient wireless access coverage or nomadic computer use.
Connectivity disruptions can severely decrease TCP performance. The
main reason for this decrease is TCP's retransmission behavior after
a connectivity disruption [SCHUETZ]. TCP uses periodic
retransmission attempts in exponentially increasing intervals, which
can unnecessarily delay retransmissions after connectivity returns.
In the extreme case, TCP connections can even abort, if the
disruption is longer than the TCP "user timeout". (Connection aborts
are out of scope for this document but can be prevented by the TCP
User Timeout Option [I-D.ietf-tcpm-tcp-uto].)
This second response action executes when receiving a connectivity-
change indication while a connection is stalled in exponential back-
off. It improves TCP retransmission behavior after connectivity is
restored through an immediate speculative retransmission attempt
[footnote-1]. Similar to the first response component, the second
one also increases TCP performance through a more intelligent
transmission behavior that uses periods of connectivity more
efficiently. In comparison to startup of a new connection, it does
not cause significant amounts of additional traffic and it does not
change TCP's congestion control algorithms.
Finally, this draft specifies a third response component, which is a
new TCP option that notifies the connection's remote peer of a
Schuetz, et al. Expires August 25, 2008 [Page 5]
Internet-Draft TCP Response to Connectivity Indications February 2008
connectivity-change event detected locally. This is useful because
connectivity-change indications typically require appropriate
responses at both ends of a connection, but may only be received or
detected by one end. The other parts of the response to a
connectivity-change indication are independent of the indication's
source (locally notified or remotely signaled) and depend only on the
specific indication and the state of the connection for which it was
received.
4. Connectivity-Change Indications
The focus of this document is on specifying TCP response mechanisms
to lower-layer connectivity-change indications. This section briefly
describes how different network- and shim-layer mechanisms underneath
the transport layer may provide these connectivity-change indications
to TCP. This section is included for clarification only; details on
connectivity indication sources are out of scope of this document.
When lower layers detect a connectivity-change event, they generate
corresponding connectivity-change indications. Lower-layer events
that could trigger such an indication include (but are not limited
to):
o the IP address of the local outbound interface used for a given
connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router
advertisements [RFC2460];
o link-layer connectivity of the local outbound interface used for a
given connection has changed, e.g., link-layer "link up" event
[RFC4957];
o the local outbound interface used for a given connection has
changed, due to routing changes or link-layer connectivity changes
at other interfaces (including tunnel establishment or teardown,
e.g., in response to IKE events [RFC4306]);
o a Mobile IP binding update has completed [RFC3775];
o a HIP readdressing update has completed [I-D.ietf-hip-mm];
o a path-change signal from the network has arrived (possible in
theory, depends on network capabilities);
o other notifications as defined by the IETF's Detecting Network
Attachment (DNA) working group have occurred [RFC4957].
Note that the list above only describes some potential sources for
Schuetz, et al. Expires August 25, 2008 [Page 6]
Internet-Draft TCP Response to Connectivity Indications February 2008
connectivity-change events. Other sources exist, but the details on
when to generate such events are out of the scope of this document,
which focuses on the TCP response mechanisms when such events are
received.
5. TCP Response to Connectivity-Change Indications (CCIs)
A TCP connection can receive a connectivity-change indication (CCI)
either from its local stack ("local CCI") or through a new
"connectivity-change indication TCP option" from its peer ("remote
CCI"). Section 5.1 specifies this new TCP option. In either case,
upon reception of a CCI, the TCP RLCI (Response to Lower-layer
Connectivity-change Indications) mechanisms defined in this document
immediately re-probe path characteristics. They do this by either
performing a speculative retransmission or by sending a single
segment of new data or a pure ACK, depending on whether the
connection is currently stalled in exponential back-off or
transmitting in steady-state, respectively. A connection is "stalled
in exponential back-off", if at least one segment was retransmitted
due to a RTO expiration but has not been ACK'ed yet.
The remainder of this section first defines the format of the new CCI
TCP option in Section 5.1 and its processing in Section 5.2. After
that, the two TCP response mechanisms triggered by receiving CCIs -
re-probing path characteristics and speculative retransmission - are
described in Section 5.3 and Section 5.4.
The TCP RLCI mechanisms defined in this document depend on the TCP
Timestamps option (TSopt) [RFC1323]. Consequently, it is REQUIRED
that an end host that wishes to use the RLCI mechanisms for a TCP
connection negotiate the use of TCP Timestamps options with its peer.
If this negotiation fails, a host MUST NOT use the RLCI mechanisms
for a connection. TCP Timestamps options are needed by the RLCI
mechanisms during the following operations:
o To re-probe the path characteristics after a connectivity-change
indication. A host uses the TS Echo Reply (TSecr) field of a TCP
Timestamps option to distinguish whether incoming ACKs are for
segments that have been transmitted before or after CCI.
o To identify a new remote CCI. A host uses the TS Value (TSval)
field of an incoming TCP Timestamps option to distinguish a new
remote CCI from the delayed reception of an old one. As a result,
last remote CCI is defined as the one received with the highest TS
Value.
Section 5.2 and Section 5.3 give more details about how the RLCI
Schuetz, et al. Expires August 25, 2008 [Page 7]
Internet-Draft TCP Response to Connectivity Indications February 2008
mechanisms use TCP Timestamps options.
An implementation of the RLCI mechanisms defined in this document
maintains nine new state variables per TCP connection. [footnote-2]
LOCAL_CCI
It is a 1-bit counter, having an initial value of 0. It is used
for distinguishing the existence of a new local CCI. It changes
its value every time a new local CCI received from the local stack
starts being processed.
REMOTE_CCI
It holds a copy of the last CCI value advertised by the peer
through a CCI TCP option. This is a 1-bit counter initialized to
0 and gets updated in response to remote CCIs according to the
rules defined in Section 5.2.
LOCAL_CCI_STATUS
It holds the status of the processing of local CCIs. It can have
three possible values: LOCAL_CCI_IDLE (0), LOCAL_CCI_NEW (1),
LOCAL_CCI_ECHO_ACK (2). The initial value is LOCAL_CCI_IDLE.
REMOTE_CCI_STATUS
It holds the status of the processing of the last remote CCI
advertised by the peer through a CCI TCP option. It can have two
possible values: REMOTE_CCI_IDLE (0), REMOTE_CCI_ECHO (1). The
initial value is REMOTE_CCI_IDLE.
LAST_CCI_TIME
It holds the local time when the last CCI (either local or remote)
was received. It is updated every time either LOCAL_CCI or
REMOTE_CCI is modified.
REMOTE_CCI_PEER_TIME
This variable is used in order to distinguish new remote CCIs from
the retransmissions of the past ones. It holds the TS Value
(TSval) of the Timestamps option of the segment advertising the
last remote CCI. It is initialized when receiving the first
segment from the peer and it is updated every time REMOTE_CCI is
modified.
LOCAL_CCI_PEER_ECHO_TIME
This variable is used in order to distinguish the echo of a new
local CCI from delayed retransmissions of echoes of older local
CCIs. It holds the TS Value (TSval) of the Timestamps option of
the segment that echoed the last local CCI. It is initialized
when receiving the first segment from the peer and it is updated
every time LOCAL_CCI_STATUS changes from LOCAL_CCI_NEW to
Schuetz, et al. Expires August 25, 2008 [Page 8]
Internet-Draft TCP Response to Connectivity Indications February 2008
LOCAL_CCI_ECHO_ACK.
CCI_SNDMAX
Retains the highest sequence number transmitted when the most
recent CCI (either local or remote) was received.
CCI_CONTROLLED_CWND
It is a Boolean variable that sets an additional condition
controlling the increment of TCPs congestion window (CWND).
Having an initial value of false, it is updated according to the
rules defined in Section 5.2.
5.1. Connectivity-Change Indication (CCI) TCP Option
Connectivity-change indications (CCIs) are generally asymmetric,
i.e., they may occur or be detected by one end but not the other.
The basic idea behind the CCI option is to signal the occurrence of
local CCIs to the other end, in order to allow also the other end to
respond appropriately. Note that this assumes that paths will
generally be symmetric, meaning that a CCI received by one end for
its path to the other end will imply that the characteristics of the
reverse path have changed, too.
1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+---------------+---------------+-----+-+-+---+-+
| | | R | | | |E|
| Kind = X | Length = 3 | E |C|E| C |C|
| | | S | |C| S |S|
+---------------+---------------+-----+-+-+---+-+
Figure 1: Format of the connectivity-change indication TCP option.
Figure 1 shows the format of the CCI option. It contains these
fields:
Kind (8 bits)
The TCP option number X [RFC0793] allocated by IANA upon
publication of this document (see Section 8).
Length (8 bits)
Length of the TCP option in octets [RFC0793]; its value MUST be 3.
RES (3 bits)
Reserved bits. The sender SHOULD set these to zero and the
receiver MUST ignore them.
Schuetz, et al. Expires August 25, 2008 [Page 9]
Internet-Draft TCP Response to Connectivity Indications February 2008
C (1 bit)
Current value of LOCAL_CCI of the end sending the option.
EC (1 bit)
Echoed value of C, i.e., the current value of REMOTE_CCI of the
end sending the option.
CS (2 bit)
Current value of LOCAL_CCI_STATUS of the end sending the option.
ECS (1 bit)
Current value of REMOTE_CCI_STATUS of the end sending the option.
The CCI option contains two single-bit fields (C and EC) used to
distinguish new CCIs from delayed retransmissions of past ones. It
also contains some flags representing the status of each CCI
processing. These flags are used for a 3-way handshake ensuring that
both parties have been informed of a new CCI. At the beginning of a
connection, LOCAL_CCI and REMOTE_CCI MUST be set to 0.
LOCAL_CCI_STATUS and REMOTE_CCI_STATUS MUST be set to LOCAL_CCI_IDLE
and REMOTE_CCI_IDLE, respectively.
A host actively opening a connection and wishing to use the CCI
option for that connection MUST include a CCI option in its SYN
segment with C := 0, CS := LOCAL_CCI_IDLE, EC := 0 and ECS :=
REMOTE_CCI_IDLE in order to advertise support for the TCP CCI option.
A host receiving a SYN segment MUST NOT include a CCI option in its
SYN-ACK or any subsequent segment, unless it has received a CCI
option in the corresponding SYN. In case a host has received a CCI
option in the SYN segment, it MUST echo that CCI option in its SYN-
ACK segment, i.e., it MUST set C := 0, CS := LOCAL_CCI_IDLE, EC := 0
and ECS := REMOTE_CCI_IDLE. A host MUST NOT process any following
CCI options unless one was included in both the SYN and SYN-ACK and
both peers have enabled TCP Timestamps for the connection.
Section 5.2.1 and Section 5.2.2 describe the processing rules in
detail.
A host MUST send a CCI option in all outgoing segments whenever
LOCAL_CCI_STATUS is not LOCAL_CCI_IDLE or REMOTE_CCI_STATUS is not
REMOTE_CCI_IDLE (or both). A host MUST NOT send a CCI option when
LOCAL_CCI_STATUS is LOCAL_CCI_IDLE and REMOTE_CCI_STATUS is
REMOTE_CCI_IDLE, i.e., when the host is not currently processing any
CCI. The only exceptions to that rule are SYN and SYN-ACK segments.
Whenever sending any CCI option, C MUST be set to the current
LOCAL_CCI, EC MUST be set to the current REMOTE_CCI, CS MUST be set
to LOCAL_CCI_STATUS and ECS MUST be set to REMOTE_CCI_STATUS,
respectively.
Schuetz, et al. Expires August 25, 2008 [Page 10]
Internet-Draft TCP Response to Connectivity Indications February 2008
5.2. Generation and Processing of Connectivity-Change Indication TCP
Options
Processing of a connectivity-change indication can be separated into
two parts:
1. Processing in "initiator" mode, i.e., when a host receives a
local CCI and (reliably) forwards it to the other end through a
CCI option.
2. Processing in "responder" mode, i.e., when a host that receives a
remote CCI in a CCI option from the other end.
Section 5.2.1 and Section 5.2.2 describe the state machines at an
initiator and a responder, respectively. Note that a single host can
be both - initiator and responder - at the same time. This can
happen if a local CCI occurs while processing for a remote CCI is
ongoing, or vice versa.
The following events, conditions and actions are used in the
definition of the two state machines:
Events:
E_LOCAL_CCI
Local end received a local CCI.
E_REMOTE_CCI
Local end received information about a remote CCI, i.e., received
a TCP segment that includes a CCI option.
E_SEGMENT_SENT
Local end sent a TCP segment that includes the CCI option.
Conditions:
C_NEW_REMOTE_CCI
A received CCI option signals a new remote CCI, i.e., C !=
REMOTE_CCI, CS == LOCAL_CCI_NEW and the TSval of the Timestamps
option of the received segment is greater than the current
REMOTE_CCI_PEER_TIME (TSval > REMOTE_CCI_PEER_TIME).
C_ECHOED_LOCAL_CCI
A received CCI option echoes the last local CCI, i.e., EC ==
LOCAL_CCI, ECS == REMOTE_CCI_ECHO and the TSval of the Timestamps
option of the received segment is greater than the current
LOCAL_CCI_PEER_ECHO_TIME (TSval > LOCAL_CCI_PEER_ECHO_TIME).
Schuetz, et al. Expires August 25, 2008 [Page 11]
Internet-Draft TCP Response to Connectivity Indications February 2008
C_ECHOED_REMOTE_CCI
A received CCI option acknowledges that the peer has received the
echo of its last local CCI, i.e., C == REMOTE_CCI, CS ==
LOCAL_CCI_ECHO_ACK and the TSval of the Timestamps option of the
received segment is greater than the current REMOTE_CCI_PEER_TIME
(TSval > REMOTE_CCI_PEER_TIME).
Actions:
A_TGL_LOCAL_CCI
Toggle LOCAL_CCI.
A_TGL_REMOTE_CCI
Toggle REMOTE_CCI.
A_REPROBE_PATH
TCP discards all congestion control information gathered on the
current path, initializes them to the defaults and re-probes path
characteristics based only on the segments transmitted after this
event, as described in Section 5.3. In other words,
CCI_CONTROLLED_CWND := 1, LAST_CCI_TIME := current local time,
CCI_SNDMAX := highest sequence number transmitted so far and the
congestion control state (CWND and SS_THRESH), round-trip time
measurement (RTTM) state and RTO timer are reset to the initial
values for a new connection. Additionally, if the connection is
stalled in exponential back-off, TCP MUST act as if RTO had
expired and start the speculative retransmission procedure
described in Section 5.4.
A_FORCE_SEND
Force transmission of a segment that MUST include a CCI option, in
order to inform the other peer about the local CCI. If the
connection is stalled in exponential back-off, this is taken care
of by the speculative retransmission procedure described in
Section 5.4. If the connection is in steady-state and there is
new data to be sent, TCP MUST immediately send a single segment of
new data including a CCI option. If there is no new data to be
sent, TCP MUST immediately send a pure ACK including a CCI option.
A_UPD_CCI_PEER_TIME
Set REMOTE_CCI_PEER_TIME to the TSval value of the TCP Timestamps
option of the received segment.
A_UPD_CCI_PEER_E_TIME
Set LOCAL_CCI_PEER_ECHO_TIME to the TSval value of the TCP
Timestamps option of the received segment.
Schuetz, et al. Expires August 25, 2008 [Page 12]
Internet-Draft TCP Response to Connectivity Indications February 2008
5.2.1. Initiator Mode Processing
This section describes the initiator mode processing of a TCP host
implementing RLCI. In initiator mode, a host signals the occurrence
of a local CCI to its peer, until the peer echoes reception of that
CCI. After receiving the echo, the host needs to acknowledge the
echo reception, resulting in a 3-way handshake. Figure 2 shows the
corresponding state machine.
At the beginning of a connection, i.e., before the first local CCI
occurs, LOCAL_CCI is 0 and LOCAL_CCI_STATUS is LOCAL_CCI_IDLE. This
remains the case until TCP receives a local CCI (E_LOCAL_CCI).
When that happens, TCP toggles LOCAL_CCI (A_TGL_LOCAL_CCI), sets
LOCAL_CCI_STATUS := LOCAL_CCI_NEW, starts re-probing the new path
(A_REPROBE_PATH) and forces a segment to be sent to the peer
(A_FORCE_SEND).
Note that all subsequently transmitted segments MUST contain a CCI
option until LOCAL_CCI_STATUS becomes LOCAL_CCI_IDLE. After the host
receives the echo of the local CCI (C_ECHOED_LOCAL_CCI), it updates
LOCAL_CCI_PEER_ECHO_TIME (A_UPD_CCI_PEER_E_TIME) and sets
LOCAL_CCI_STATUS := LOCAL_CCI_ECHO_ACK. The initiator remains in
this state until it can send a segment with the CCI option
(E_SEGMENT_SENT) that acknowledges reception of the CCI echo. At
that time, it sets LOCAL_CCI_STATUS := LOCAL_CCI_IDLE.
The transition from LOCAL_CCI_IDLE to LOCAL_CCI_ECHO_ACK occurs if a
segment acknowledging the reception of a CCI echo is lost, and the
initiator retransmits the echo acknowledgment.
When a local CCI occurs (E_LOCAL_CCI) while LOCAL_CCI_STATUS !=
LOCAL_CCI_IDLE, the host MUST ignore it and MUST NOT alter LOCAL_CCI,
because it is already processing another local CCI.
Schuetz, et al. Expires August 25, 2008 [Page 13]
Internet-Draft TCP Response to Connectivity Indications February 2008
E_LOCAL_CCI =>
A_TGL_LOCAL_CCI E_REMOTE_CCI
A_REPROBE_PATH C_ECHOED_LOCAL_CCI=>
A_FORCE_SEND A_UPD_CCI_PEER_E_TIME
+----------------+ +----------------+
| | | |
| | | |
| | | |
| V | V
+----------------+ +----------------+ +----------------+
| | | | | |
|LOCAL_CCI_STATUS| |LOCAL_CCI_STATUS| |LOCAL_CCI_STATUS|
| == | | == | | == |
|LOCAL_CCI_IDLE | |LOCAL_CCI_NEW | |LOCAL_CCI_ECHO_ |
| | | | |ACK |
+----------------+ +----------------+ +----------------+
^ | ^ |
| | | |
| +-----------------------------------+ |
| E_REMOTE_CCI |
| C_ECHOED_LOCAL_CCI |
| |
| |
+-----------------------------------------+
E_SEGMENT_SENT
Figure 2: State machine for initiator processing.
5.2.2. Responder Mode Processing
This section describes the responder mode processing of CCIs for a
TCP host implementing the CCI option. In responder mode, a host
echoes the last received remote CCI to its peer, until it can be sure
that the peer correctly received the echo. Figure 3 shows the
corresponding state machine.
At the beginning of a connection, REMOTE_CCI is 0 and
REMOTE_CCI_STATUS is REMOTE_CCI_IDLE, i.e., the local host is not
processing any remote CCIs.
When TCP receives a segment with a CCI option (E_REMOTE_CCI)
signaling a new remote CCI (C_NEW_REMOTE_CCI), it increments
REMOTE_CCI (A_TGL_REMOTE_CCI), changes REMOTE_CCI_STATUS to
REMOTE_CCI_ECHO, updates REMOTE_CCI_PEER_TIME according to TSval
(A_UPD_CCI_PEER_TIME), starts re-probing the new path
(A_REPROBE_PATH) and forces a segment to be sent to the peer
Schuetz, et al. Expires August 25, 2008 [Page 14]
Internet-Draft TCP Response to Connectivity Indications February 2008
(A_FORCE_SEND).
Note that all subsequently transmitted segments MUST contain a CCI
option until REMOTE_CCI_STATUS is again REMOTE_CCI_IDLE. This
transition occurs when the peer acknowledges the reception of the CCI
echo (C_ECHOED_REMOTE_CCI).
E_REMOTE_CCI E_REMOTE_CCI
C_NEW_REMOTE_CCI => C_NEW_REMOTE_CCI =>
A_TGL_REMOTE_CCI A_TGL_REMOTE_CCI
A_UPD_CCI_PEER_TIME A_UPD_CCI_PEER_TIME
A_REPROBE_PATH A_REPROBE_PATH
A_FORCE_SEND A_FORCE_SEND
+-----------------+ +-------------+
| | | |
| V | |
+-----------------+ +-----------------+ |
|REMOTE_CCI_STATUS| |REMOTE_CCI_STATUS| |
| == | | == | |
|REMOTE_CCI_IDLE | |REMOTE_CCI_ECHO | |
+-----------------+ +-----------------+ |
^ | ^ |
| | | |
+-----------------+ +-------------+
E_REMOTE_CCI
C_ECHOED_REMOTE_CCI
Figure 3: State machine for responder processing.
If TCP receives a new remote CCI while REMOTE_CCI_STATUS ==
REMOTE_CCI_ECHO, this indicates that the acknowledgment of a previous
CCI echo may have been lost and that the peer had a new CCI occur.
In this case, TCP MUST perform the same actions as if
REMOTE_CCI_STATUS == REMOTE_CCI_IDLE.
5.3. Re-Probing Path Characteristics
When a TCP connection receives a new CCI, it MUST re-probe path
characteristics in order to prevent causing congestion by
transmitting based on stale path state information. In principle,
this is similar to the initial slow-start: The sender MUST NOT
transmit more than the default initial window (INIT_WINDOW) of data
after a new CCI is received and it MUST reset the congestion control
state (CWND and SS_THRESH), round-trip time measurement (RTTM) state
and RTO timer, as if this were a new connection [RFC2581][RFC2988].
Schuetz, et al. Expires August 25, 2008 [Page 15]
Internet-Draft TCP Response to Connectivity Indications February 2008
If Path MTU Discovery (PMTUD) is in use, the PMTUD state MUST also be
reset [RFC1191][RFC1981][RFC4821].
One difference to an initial slow-start is that after a CCI, the
connection may have segments in flight towards the destination along
a previous path. Therefore, after a CCI, TCP MUST ignore any ACKs
received for data that was sent before the CCI and it MUST update the
congestion window solely based on ACKs for data that was sent after
the CCI occurred.
The mechanism used for distinguishing ACKs for data sent after a CCI
occurred from ACKs for data sent before a CCI occurred uses TCP
Timestamps options. When a host receives a new CCI (either local or
remote), LAST_CCI_TIME MUST be set to the current local time,
CCI_SNDMAX MUST be set to the highest sequence number transmitted so
far and CCI_CONTROLLED_CWND MUST be set to true.
While CCI_CONTROLLED_CWND == true, TCP MUST update the congestion
window based only on inbound ACKs that contain a TS Echo Reply
(TSecr) value greater than or equal to LAST_CCI_TIME. Any inbound
ACK with a TS Echo Reply (TSecr) value less than LAST_CCI_TIME MUST
NOT cause an update to the congestion window, even if it advances the
window. If CCI_CONTROLLED_CWND is true and the host receives an ACK
with a sequence number greater than or equal to CCI_SNDMAX,
CCI_CONTROLLED_CWND MUST be set to false and the congestion control
algorithm MUST begin to process all ACKs normally, without checking
their Timestamps options.
5.4. Speculative Retransmission
The basic idea behind the speculative retransmission is to allow TCP
to resume stalled connections as soon as it receives an indication
that connectivity to previously unreachable peers may have returned.
When a TCP connection receives a new CCI - either from the local
stack or in a CCI TCP option from the peer - and is currently stalled
in exponential back-off, it MUST immediately initiate the standard
retransmission procedure, just as if the RTO for the connection had
expired.
6. Discussion
This section discusses some design choices of the RLCI mechanisms
that can affect TCP performance under certain circumstances.
Schuetz, et al. Expires August 25, 2008 [Page 16]
Internet-Draft TCP Response to Connectivity Indications February 2008
6.1. Triggered Segment Transmission during Steady-State
A TCP stack that implements RLCI mechanisms and receives a local CCI
immediately sends a TCP segment (A_FORCE_SEND) in order to inform the
other end of the CCI and resets all path information
(A_REPROBE_PATH). When TCP is stalled in exponential back-off, this
is taken care of by the speculative retransmission procedure that is
triggered by the CCI.
On the other hand, when TCP is in steady-state, it sends a new
segment (A_FORCE_SEND) if there is any new data queued for
transmission. As usual, the number of unacknowledged segments is
limited by CWND. However, CWND has just been reset to its initial
value. This means that there is a possibility that the transmission
sends a segment that is outside the current congestion window.
Although this behavior may appear to be aggressive, it is in fact as
conservative as a newly starting connection, because only a single
unacknowledged segment is sent along the path after CCI.
6.2. Impact of Packet Loss
If a connection is in exponential back-off when a CCI occurs, TCP
considers all unacknowledged segments to be lost and the speculative
retransmission procedure immediately starts.
On the other hand, if the connection is in steady-state when a CCI
occurs, TCP considers all unacknowledged segments to still be in
flight and continues sending new data. Depending on what caused a
CCI, four scenarios are possible that differ in what happens to
segments and ACKs in flight:
1. All (or at least the vast majority of) segments and ACKs in
flight reach their respective destinations, i.e., there are no
losses. In this case, TCP acts as if a new connection had
started and re-probes the new path.
2. Some of the ACKs in flight from the receiver to the sender are
lost. In this case, TCP behaves exactly as above, because a
cumulative ACK for the new segment sent along the path after the
CCI acknowledges all the previous unacknowledged segments.
3. Some of the data segments in flight from the sender to the
receiver are lost. In this case, the new data segment
transmitted after the CCI causes a duplicate ACK. As this
duplicate ACK does not cause TCP to send another data segment,
the connection stalls and a RTO occurs. After RTO, the standard
retransmission procedure takes place with SS_THRESH equal to
INITIAL_WINDOW/2 (i.e., the minimum allowed). This disables slow
Schuetz, et al. Expires August 25, 2008 [Page 17]
Internet-Draft TCP Response to Connectivity Indications February 2008
start and causes a severely decreased performance. A possible
solution is to execute the speculative retransmission procedure
after receiving a CCI even if the connection is in steady-state.
4. Some of the data segments and some of the ACKs that are in flight
are lost. This case is similar to the previous one.
In all these cases, it is also possible that the round-trip time
changes significantly after the CCI, reordering data segments and
ACKs that are still in flight with ones sent after the CCI. These
reorderings appear to TCP as losses, and may result in the connection
experiencing one of the above cases even if there was no actual
packet loss.
6.3. Use of Limited Transmit with RLCI
As described in the previous section, when a connection is in steady-
state, a connectivity-change indication (CCI) resets all path
information of TCP and causes one new data segment to be sent. In
case of significant data segment loss before a CCI, the new data
segment transmitted after a CCI causes a duplicate ACK. As this
duplicate ACK does not trigger TCP to send another data segment, the
connection stalls and an RTO occurs.
Limited Transmit [RFC3042] can be used in case of packet loss in
order to cause the transmission of three duplicate ACKs and trigger
the fast retransmission procedure. As it must not cause an amount of
outstanding data more than the congestion window plus two segments,
it cannot always be used after a CCI due to the initialized CWND. If
the connection has more outstanding data than INITIAL_WINDOW plus two
segments before a CCI, resetting of CWND to the initial value after
CCI causes an amount of outstanding data greater than the new CWND
plus two segments and disables Limited Transmit.
A modified Limited Transmit algorithm can be used in combination with
RLCI:
If CCI_CONTROLLED_CWND is true:
The Limited Transmit Algorithm as described in [RFC3042] should be
followed, but without checking the amount of outstanding data,
i.e., if a TCP sender has previously unsent data queued for
transmission it should transmit new data upon the arrival of the
first two consecutive duplicate ACKs when the receiver's
advertised window allows this transmission.
Schuetz, et al. Expires August 25, 2008 [Page 18]
Internet-Draft TCP Response to Connectivity Indications February 2008
If CCI_CONTROLLED_CWND is false:
The Limited Transmit Algorithm as described in [RFC3042] should be
followed unmodified.
When the fast retransmission procedure is triggered by the modified
Limited Transmit after a CCI, SS_THRESH is set to INITIAL_WINDOW/2
(i.e., the minimum allowed) as CWND before fast retransmission was
equal to INITIAL_WINDOW. As a result, slow-start is disabled causing
decreased TCP performance.
A minor modification can keep SS_THRESH unmodified in the previous
case, i.e., if CCI_CONTROLLED_CWND == true and CWND ==
INITIAL_WINDOW, keep SS_THRESH unmodified (having its initial value)
upon the reception of the third duplicate ACK that triggers the fast
retransmission procedure.
6.4. Simultaneous Processing of Connectivity-Change Indications
As mentioned in Section 5.2.1, if a local CCI occurs (E_LOCAL_CCI)
while LOCAL_CCI_STATUS != LOCAL_CCI_IDLE, the host MUST ignore it,
because it is already processing another local CCI. As a result,
only one local CCI at each end can be processed at the same time.
Consequently, as every remote CCI at one end is triggered by a local
CCI at the other end, only one remote CCI at each end can be
processed at the same time.
On the other hand, if both hosts receive connectivity-change
indications from their local stacks (local CCIs) at almost the same
time, there is a possibility of simultaneous processing of local and
remote CCIs at both ends. In that case, path re-probing is triggered
twice at each end in a very short time that can be lower than RTT.
As this does not improve TCP performance, it can be avoided by
triggering the A_REPROBE_PATH action only if CCI_CONTROLLED_CWND ==
false.
7. Security Considerations
The only foreseen security considerations with the techniques
presented in this document result from either an attacker's ability
to spoof valid TCP segments with CCI options that seemingly indicate
connectivity changes, or an attacker's ability to generate bogus CCIs
locally. An attacker might produce a stream of such false indicators
that could keep a connection in slow-start at the initial window.
One possible defense against this type of attack is to rate-limit the
response to CCIs (whether local or remote). This is also probably
less serious than other attacks such an empowered adversary could
perform, like resetting the connection or injecting data. A similar
Schuetz, et al. Expires August 25, 2008 [Page 19]
Internet-Draft TCP Response to Connectivity Indications February 2008
effect could be achieved without the new CCI option by forging
duplicate ACKs that would keep a sender in loss recovery. If both
sets of IP addresses, port numbers, and sequence numbers are
guessable for a connection, then the connection should employ other
measures [RFC4953] for protection against spoofed segments.
8. IANA Considerations
This section is to be interpreted according to
[I-D.narten-iana-considerations-rfc2434bis].
This document does not define any new namespaces. It requests that
IANA allocate a new 8-bit TCP option number for the CCI option from
the registry maintained at
http://www.iana.org/assignments/tcp-parameters.
9. Acknowledgments
This draft combines and obsoletes [I-D.swami-tcp-lmdr] and
[I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to
thank Mark Allman, Marcus Brunner, Alfred Hoenes, Shashikant
Maheshwari, Kacheong Poon, Juergen Quittek, Stefan Schmid and Joe
Touch for their comments and suggestions on this draft as well as the
two original drafts.
Simon Schuetz and Lars Eggert are partly funded by the Trilogy
project, a research project supported by the European Commission
under its Seventh Framework Program.
Wesley Eddy's work on this document was performed at NASA's Glenn
Research Center, while in support of the NASA Space Communications
Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future
Communications Study (FCS).
10. References
10.1. Normative References
[I-D.narten-iana-considerations-rfc2434bis]
Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs",
draft-narten-iana-considerations-rfc2434bis-08 (work in
progress), October 2007.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
Schuetz, et al. Expires August 25, 2008 [Page 20]
Internet-Draft TCP Response to Connectivity Indications February 2008
RFC 793, September 1981.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990.
[RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
for High Performance", RFC 1323, May 1992.
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, August 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999.
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000.
[RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", RFC 3042,
January 2001.
[RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", RFC 4821, March 2007.
10.2. Informative References
[DUKE] Duke, M., Henderson, T., and J. Meegan, "Experience with
``Link-UP Notification'' Over a Mobile Satellite Link",
ACM Computer Communication Review, Vol. 34, No. 3,
July 2004.
[EDDY] Eddy, W. and Y. Swami, "Adapting End-host Congestion
Control for Mobility", NASA Glenn Research Center
Technical Report, CR-2005-213838, July 2005.
[I-D.dawkins-trigtran-linkup]
Dawkins, S., "End-to-end, Implicit 'Link-Up'
Notification", draft-dawkins-trigtran-linkup-01 (work in
progress), October 2003.
[I-D.eggert-tcpm-tcp-retransmit-now]
Eggert, L., "TCP Extensions for Immediate
Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
(work in progress), June 2005.
Schuetz, et al. Expires August 25, 2008 [Page 21]
Internet-Draft TCP Response to Connectivity Indications February 2008
[I-D.ietf-hip-mm]
Henderson, T., "End-Host Mobility and Multihoming with the
Host Identity Protocol", draft-ietf-hip-mm-05 (work in
progress), March 2007.
[I-D.ietf-tcpimpl-restart]
Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
Slow-Start Restart After Idle",
draft-ietf-tcpimpl-restart-00 (work in progress),
March 1998.
[I-D.ietf-tcpm-tcp-uto]
Eggert, L. and F. Gont, "TCP User Timeout Option",
draft-ietf-tcpm-tcp-uto-08 (work in progress),
November 2007.
[I-D.swami-tcp-lmdr]
Swami, Y., "Lightweight Mobility Detection and Response
(LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work
in progress), March 2006.
[KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context
Transfers in Mobile Networks", ACM Computer Communication
Review, Vol. 31, No. 5, October 2001.
[OTT] Ott, J. and D. Kutscher, "OTT Internet: IEEE 802.11b for
Automobile Users", Proc. Infocom 2004, March 2004.
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC2131] Droms, R., "Dynamic Host Configuration Protocol",
RFC 2131, March 1997.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001.
[RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344,
August 2002.
[RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support
in IPv6", RFC 3775, June 2004.
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
Schuetz, et al. Expires August 25, 2008 [Page 22]
Internet-Draft TCP Response to Connectivity Indications February 2008
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
Wood, "Advice for Internet Subnetwork Designers", BCP 89,
RFC 3819, July 2004.
[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
RFC 4306, December 2005.
[RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick-
Start for TCP and IP", RFC 4782, January 2007.
[RFC4907] Aboba, B., "Architectural Implications of Link
Indications", RFC 4907, June 2007.
[RFC4953] Touch, J., "Defending TCP Against Spoofing Attacks",
RFC 4953, July 2007.
[RFC4957] Krishnan, S., Montavont, N., Njedjou, E., Veerepalli, S.,
and A. Yegin, "Link-Layer Event Notifications for
Detecting Network Attachments", RFC 4957, August 2007.
[SCHUETZ] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
"Protocol Enhancements for Intermittently Connected
Hosts", ACM Computer Communication Review, Vol. 35, No. 3,
July 2005.
[SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimization
for disconnecting networks", ACM Computer Communication
Review, Vol. 33, No. 5, October 2003.
Editorial Comments
[] The authors have heard the idea of triggering
retransmits based on connectivity events of directly-
connected links being attributed to Phil Karn ("kick"
operation in the KAQ9 TCP stack). A thread from the
PILC mailing list in 2000 discusses some thoughts on
this (http://www.isi.edu/pilc/list/archive/0691.html).
[] Although this specification introduces eight new per-
connection state variables, a preliminary
implementation of an earlier revision of this mechanism
[I-D.swami-tcp-lmdr] only required around a hundred
lines of kernel code.
Appendix A. Background: Classification of Connectivity Disruptions
Connectivity disruptions can occur in many different situations.
Schuetz, et al. Expires August 25, 2008 [Page 23]
Internet-Draft TCP Response to Connectivity Indications February 2008
They can be due to wireless interference, movement out of a wireless
coverage area, switching between access networks, or simply due to
unplugging an Ethernet cable. Depending on the situation in which
they occur, the implications of connectivity disruptions are
different and must be handled appropriately. This section attempts
to classify different types of connectivity disruptions and discusses
their implications and impact on TCP.
Two main properties of connectivity disruptions affect how TCP reacts
to them: their duration and whether the path characteristics have
significantly changed after they end. This document distinguishes
between "short" and "long" disruptions and "changed" and "unchanged"
path characteristics. Note that these two categories are orthogonal
to each other, i.e., four types of connectivity disruptions exist.
Connectivity disruptions are "short" for a given TCP connection, if
connectivity returns before the RTO fires for the first time, i.e.,
when TCP is still in steady-state. In this case, standard TCP
recovers lost data segments through Fast Retransmit and lost ACKs
through successfully delivered later ACKs. Appendix A.1 briefly
describes this case.
Connectivity disruptions are "long" for a given TCP connection, if
the RTO fires at least once before connectivity returns, i.e., when
TCP is in exponential back-off. In this case, TCP can be inefficient
in its retransmission scheme, as described in Appendix A.2.
Whether or not path characteristics change when connectivity returns
is a second important factor for TCP's retransmission scheme.
Standard TCP implicitly assumes that path characteristics remain
unchanged across short disruptions by performing Fast Retransmit
using the path parameters collected before the disruption. For long
disruptions, standard TCP is more conservative and performs slow-
start, re-probing the path characteristics from scratch. However,
the standard behavior can be inefficient due to when it is initiated.
These implicit assumptions can cause standard TCP to misbehave or
perform inefficiently in some scenarios. Figure 4 illustrates the
standard TCP behavior.
Schuetz, et al. Expires August 25, 2008 [Page 24]
Internet-Draft TCP Response to Connectivity Indications February 2008
+-----------------------+-----------------------+
Short | Fast Retransmit using | Fast Retransmit using |
Duration | currently collected | currently collected |
< RTO | path characteristics | path characteristics |
+-----------------------+-----------------------+
Long | | |
Duration | Slow-start | Slow-start |
>= RTO | | |
+-----------------------+-----------------------+
Unchanged Path Changed Path
Characteristics Characteristics
Figure 4: Standard TCP behavior.
A.1. Short Connectivity Disruptions
One common cause of short connectivity disruptions that result in a
change of the end-to-end path characteristics is transparent network
layer mobility, via protocols such as Mobile IP, NEMO, or HIP. These
protocols generally hide mobility events from the transport layer,
but cannot mask the resulting changes to the end-to-end path that
established TCP connections transmit over.
Consider a Mobile IP scenario as shown in Figure 5. At time T, a
mobile node MN attaches to access network Net-1, connected to the
Internet through access router AR-1 and has the care-of address
<Net-1, MN>. It establishes a TCP connection to the correspondent
node CN. While MN attaches to AR-1, packets between CN and <Net-1,
MN> follow PATH-1 (via Cloud-1 and AR-1). Assume that at some time
T+1, MN moves and then attaches to Net-2, which is reachable through
AR-2 with the care-of address <Net-2, MN>. While MN attaches to
AR-2, all packets between CN and <Net-2, MN> follow PATH-2 (through
Cloud-2 and AR-2).
Schuetz, et al. Expires August 25, 2008 [Page 25]
Internet-Draft TCP Response to Connectivity Indications February 2008
<---------PATH-1---------->
/---------\ +------+
| | | | Net-1
+---+ Cloud-1 +---+ AR-1 +-----> MN (time=T)
| | | | |
| \----+----/ +---+--+ |
| | |
CN <------+ | PATH-3 |
| | |
| /----V----\ +-------+ V
| | | | |
+---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1)
| | | | Net-2
\---------/ +-------+
<--------PATH-2----------->
Figure 5: Mobility example.
During a transient disconnected period, MN may have disconnected from
Net-1 and not yet attached to Net-2. Consequently, AR-1 may not be
able to deliver packets to MN. This could result in a burst of
packet losses. Several approaches for "fast" or "seamless" handovers
exist that involve adding machinery to the ARs to buffer and redirect
packets originally sent to Net-1 towards Net-2, rather than dropping
them (e.g., [KOODLI]).
As long as MN remains in Net-1, standard congestion control
algorithms [RFC2581] are sufficient. However, once MN moves from
Net-1 to Net-2, two different scenarios are possible depending on
network topology:
o In the first scenario, with standard Mobile IPv4, all packets
destined to <Net-1, MN> are dropped by AR-1 once MN has moved.
Since the latency involved in establishing a new tunnel to the HA
is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly
an entire window's worth of data and ACKs will be dropped by AR-1.
Because of this burst loss, CN and MN are likely to incur
expensive retransmission timeouts.
o In the second scenario, with a fast handover mechanism in place,
losses are masked through buffering and tunneling between routers
AR-1 and AR-2. The exact sequence of buffering and forwarding
between the ARs is not guaranteed to occur in a manner consistent
with the available bandwidth of PATH-3 or conformant to TCP's
clocking expectations. This can cause TCP's behavior over PATH-2
to be based on the unrelated properties of PATH-1 and PATH-3.
Schuetz, et al. Expires August 25, 2008 [Page 26]
Internet-Draft TCP Response to Connectivity Indications February 2008
After attaching to Net-2, reception of stale ACKs (for data sent on
PATH-1) will cause MN to incorrectly inflate its congestion window.
These stale ACKs do not provide any indication of the congestion
along PATH-2. CN's congestion window becomes similarly inflated by
ACKs that MN sends for data segments redirected over PATH-3. If the
congestion windows from PATH-1 are already too big for PATH-2, this
can overload Net-2 or PATH-2, causing packet loss and timeouts.
On the other hand, if the available bandwidth along PATH-2 is greater
than along PATH-1, and if the sender is in congestion avoidance, it
will need potentially many RTTs before utilizing the available path
capacity. This is due to relatively slow bandwidth increase during
congestion avoidance caused by a stale SS_THRESH. (See [EDDY] for
details.)
A.2. Long Connectivity Disruptions
For long disruptions, standard TCP performs slow-start after
connectivity returns, because the retransmission timeout (RTO) has
expired. This conservative strategy avoids overloading the new path.
However, TCP's general exponential back-off retransmission strategy
can time these slow-starts such that performance decreases.
When a long connectivity disruption occurs along the path between a
host and its peer while the host is transmitting data, it stops
receiving ACKs. After the RTO expires, the host attempts to
retransmit the first unacknowledged segment. TCP implementations
that follow the recommended RTO management proposed in [RFC2988]
double the RTO after each retransmission attempt until it exceeds 60
seconds. This scheme causes a host to attempt to retransmit across
established connections roughly once a minute. (More frequently
during the first minute or two of the connectivity disruption, while
the RTO is still being backed off.)
When the long connectivity disruption ends, standard TCP
implementations still wait until the RTO expires before attempting
retransmission. Figure 6 illustrates this behavior. Depending on
when connectivity becomes available again, this can waste up to a
minute of connectivity for TCPs that implement the recommended RTO
management described in [RFC2988]. For TCP implementations that do
not implement [RFC2988], even longer connectivity periods may be
wasted. For example, Linux uses 120 seconds as the maximum RTO by
default.
Schuetz, et al. Expires August 25, 2008 [Page 27]
Internet-Draft TCP Response to Connectivity Indications February 2008
Sequence
number X = Successfully transmitted segment
^ O = Lost segment
| : : : X
| : : :X
| OO O O O O : X
| X: : :
| X : :<------------>:
| X : : Wasted :
| X : : connection :
|X : : time :
+-----:---------------------:--------------:-------->
: : : Time
Connectivity Connectivity TCP
gone back retransmit
Figure 6: Standard TCP behavior in the presence of disrupted
connectivity.
This retransmission behavior is not efficient, especially in
scenarios where connectivity periods are short and connectivity
disruptions are frequent [OTT]. Experiments show that TCP
performance across a path with frequent disruptions is significantly
worse, compared to a similar path without disruptions [SCHUETZ].
In the ideal case, TCP would attempt a retransmission as soon as
connectivity to its peer was re-established. Figure 7 illustrates
the ideal behavior.
Sequence
number X = Successfully transmitted segment
^ O = Lost segment
| : : X :
| : :X :
| OO O O O O X :
| X: : :
| X : :<------------>:
| X : : Efficiency :
| X : : improvement :
|X : : :
+-----:---------------------:--------------:-------->
: : : Time
Connectivity Connectivity Next
gone back := immediate scheduled
TCP retransmit retransmit
Figure 7: Ideal TCP behavior in the presence of disrupted
connectivity
Schuetz, et al. Expires August 25, 2008 [Page 28]
Internet-Draft TCP Response to Connectivity Indications February 2008
The ideal behavior is difficult to achieve for arbitrary connectivity
disruptions. One obviously problematic approach would use higher-
frequency retransmission attempts to enable earlier detection of
whether connectivity has returned. This can generate significant
amounts of extra traffic. Other proposals attempt to trigger faster
retransmissions by retransmitting buffered or newly-crafted segments
from inside the network
[SCOTT][I-D.dawkins-trigtran-linkup][DUKE][RFC3819].
Note that scenarios exist where path characteristics remain unchanged
after long connectivity disruptions. In this case, even an
intelligently scheduled slow-start is inefficient, because TCP could
safely resume transmitting at the old rate instead of slow-starting.
Although originally developed to avoid line-rate bursts, techniques
for the well-known "slow-start after idle" case
[I-D.ietf-tcpimpl-restart] may be useful to further improve
performance after a disruption ends in such a scenario. This
document does not currently describe this additional optimization,
and an open question remains on how unchanged path characteristics
after long connectivity disruptions could be validated by an end
host.
Appendix B. Document Revision History
+----------+--------------------------------------------------------+
| Revision | Comments |
+----------+--------------------------------------------------------+
| 03 | Mainly editorial and textual changes according to |
| | feedback received since last version. |
| 02 | Major modification to the RLCI mechanism for |
| | implementing a 3-way handshake that ensures that both |
| | peers are informed about a connectivity-change |
| | indication. CCI option format, RLCI variables |
| | maintained by the TCP peers and the related state |
| | machines are affected by that modification. |
| 01 | Major revision of the description of the |
| | connectivity-change indication TCP option and its |
| | processing in Section 5. Other formatting changes to |
| | the document include moving some background material |
| | to the appendix. |
| 00 | Initial version. This document is a merge of and |
| | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and |
| | [I-D.swami-tcp-lmdr]. |
+----------+--------------------------------------------------------+
Schuetz, et al. Expires August 25, 2008 [Page 29]
Internet-Draft TCP Response to Connectivity Indications February 2008
Authors' Addresses
Simon Schuetz
NEC Laboratories Europe
Kurfuerstenanlage 36
Heidelberg 69115
Germany
Phone: +49 6221 4342 165
Email: simon.schuetz@nw.neclab.eu
URI: http://www.nw.neclab.eu
Nikolaos Koutsianas
Nokia Research Center
Email: nkout@mobile.ntua.gr
Lars Eggert
Nokia Research Center
P.O. Box 407
Nokia Group 00045
Finland
Phone: +358 50 48 24461
Email: lars.eggert@nokia.com
URI: http://research.nokia.com/people/lars_eggert/
Wesley M. Eddy
Verizon Federal Network Systems
NASA Glenn Research Center
21000 Brookpark Road, MS 54-5
Cleveland, OH 44135
USA
Email: weddy@grc.nasa.gov
Schuetz, et al. Expires August 25, 2008 [Page 30]
Internet-Draft TCP Response to Connectivity Indications February 2008
Yogesh Prem Swami
Nokia Research Center, Dallas
955 Page Mill Road
Palo Alto, California 94304
USA
Phone: +1 972 374 0669
Email: yogesh.swami@nokia.com
Khiem Le
Nokia Siemens Networks
6000 Connection Drive
Irving, TX 75039
USA
Phone: +1 972 342 3502
Email: khiem.le@nsn.com
Schuetz, et al. Expires August 25, 2008 [Page 31]
Internet-Draft TCP Response to Connectivity Indications February 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Schuetz, et al. Expires August 25, 2008 [Page 32]