TCPM WG J. Touch
Internet Draft USC/ISI
Intended status: Informational M. Welzl
Expires: July 2017 S. Islam
University of Oslo
J. You
Huawei
January 12, 2017
TCP Control Block Interdependence
draft-touch-tcpm-2140bis-02.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Touch, et al. Expires July 12, 2017 [Page 1]
Internet-Draft TCP Control Block Interdependence January 2017
This Internet-Draft will expire on July 12, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
This memo describes interdependent TCP control blocks, where part of
the TCP state is shared among similar concurrent or consecutive
connections. TCP state includes a combination of parameters, such as
connection state, current round-trip time estimates, congestion
control information, and process information. Most of this state is
maintained on a per-connection basis in the TCP Control Block (TCB),
but implementations can (and do) share certain TCB information
across connections to the same host. Such sharing is intended to
improve overall transient transport performance, while maintaining
backward-compatibility with existing implementations. The sharing
described herein is limited to only the TCB initialization and so
has no effect on the long-term behavior of TCP after a connection
has been established.
Table of Contents
1. Introduction...................................................3
2. Conventions used in this document..............................3
3. Terminology....................................................4
4. The TCP Control Block (TCB)....................................4
5. TCB Interdependence............................................5
6. An Example of Temporal Sharing.................................5
7. An Example of Ensemble Sharing.................................8
8. Compatibility Issues..........................................10
9. Implications..................................................12
10. Implementation Observations..................................14
11. Security Considerations......................................15
12. IANA Considerations..........................................16
13. References...................................................17
13.1. Normative References....................................17
Touch Expires July 12, 2017 [Page 2]
Internet-Draft TCP Control Block Interdependence January 2017
13.2. Informative References..................................17
14. Acknowledgments..............................................19
1. Introduction
TCP is a connection-oriented reliable transport protocol layered
over IP [RFC793]. Each TCP connection maintains state, usually in a
data structure called the TCP Control Block (TCB). The TCB contains
information about the connection state, its associated local
process, and feedback parameters about the connection's transmission
properties. As originally specified and usually implemented, most
TCB information is maintained on a per-connection basis. Some
implementations can (and now do) share certain TCB information
across connections to the same host. Such sharing is intended to
lead to better overall transient performance, especially for
numerous short-lived and simultaneous connections, as often used in
the World-Wide Web [Be94],[Br02].
This document discusses TCB state sharing that affects only the TCB
initialization, and so has no effect on the long-term behavior of
TCP after a connection has been established. Path information shared
across SYN destination port numbers assumes that TCP segments having
the same host-pair experience the same path properties, irrespective
of TCP port numbers. The observations about TCB sharing in this
document apply similarly to any protocol with congestion state,
including SCTP [RFC4960] and DCCP [RFC4340], as well as for
individual subflows in Multipath TCP [RFC6824].
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying significance described in RFC 2119.
In this document, the characters ">>" preceding an indented line(s)
indicates a statement using the key words listed above. This
convention aids reviewers in quickly identifying or finding the
portions of this RFC covered by these keywords.
Touch Expires July 12, 2017 [Page 3]
Internet-Draft TCP Control Block Interdependence January 2017
3. Terminology
Host - a source or sink of TCP segments associated with a single IP
address
Host-pair - a pair of hosts and their corresponding IP addresses
Path - an Internet path between the IP addresses of two hosts
4. The TCP Control Block (TCB)
A TCB describes the data associated with each connection, i.e., with
each association of a pair of applications across the network. The
TCB contains at least the following information [RFC793]:
Local process state
pointers to send and receive buffers
pointers to retransmission queue and current segment
pointers to Internet Protocol (IP) PCB
Per-connection shared state
macro-state
connection state
timers
flags
local and remote host numbers and ports
TCP option state
micro-state
send and receive window state (size*, current number)
round-trip time and variance
cong. window size (snd_cwnd)*
cong. window size threshold (ssthresh)*
max window size seen*
sendMSS#
MMS_S#
MMS_R#
PMTU#
round-trip time and variance#
The per-connection information is shown as split into macro-state
and micro-state, terminology borrowed from [Co91]. Macro-state
describes the finite state machine; we include the endpoint numbers
and components (timers, flags) used to help maintain that state.
Macro-state describes the protocol for establishing and maintaining
shared state about the connection. Micro-state describes the
protocol after a connection has been established, to maintain the
reliability and congestion control of the data transferred in the
connection.
Touch Expires July 12, 2017 [Page 4]
Internet-Draft TCP Control Block Interdependence January 2017
We further distinguish two other classes of shared micro-state that
are associated more with host-pairs than with application pairs. One
class is clearly host-pair dependent (#, e.g., MSS, MMS, PMTU, RTT),
and the other is host-pair dependent in its aggregate (*, e.g.,
congestion window information, current window sizes, etc.).
5. TCB Interdependence
There are two cases of TCB interdependence. Temporal sharing occurs
when the TCB of an earlier (now CLOSED) connection to a host is used
to initialize some parameters of a new connection to that same host,
i.e., in sequence. Ensemble sharing occurs when a currently active
connection to a host is used to initialize another (concurrent)
connection to that host.
6. An Example of Temporal Sharing
The TCB data cache is accessed in two ways: it is read to initialize
new TCBs and written when more current per-host state is available.
New TCBs are initialized using context from past connections as
follows:
TEMPORAL SHARING - TCB Initialization
Safe? Cached TCB New TCB
----------------------------------------------
yes old_MMS_S old_MMS_S or not cached
yes old_MMS_R old_MMS_R or not cached
yes old_sendMSS old_sendMSS
yes old_PMTU old_PMTU
TBD old_RTT old_RTT
TBD old_RTTvar old_RTTvar
varies old_option (option specific)
TBD old_ssthresh old_ssthresh
TBD old_snd_cwnd old_snd_cwnd
Table entries indicate which are considered to be safe to share
temporally. The other entries are discussed in section 8.
Touch Expires July 12, 2017 [Page 5]
Internet-Draft TCP Control Block Interdependence January 2017
Most cached TCB values are updated when a connection closes. The
exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122],
PMTU which is updated after Path MTU Discovery
[RFC1191][RFC1981][RFC4821], and sendMSS, which is updated if the
MSS option is received in the TCP SYN header.
Sharing sendMSS information affects only data in the SYN of the next
connection, because sendMSS information is typically included in
most TCP SYN segments. Caching PMTU can accelerate the efficiency of
PMTUD, but can also result in black-holing until corrected if in
error. Caching MMS_R and MMS_S may be of little direct value as they
are reported by the local IP stack anyway.
[TBD - complete this section with details for TFO and other options
whose state may, must, or must not be shared] The way in which other
TCP option state can be shared depends on the details of that
option. E.g., TFO state includes the TCP Fast Open Cookie [RFC7413]
or, in case TFO fails, a negative TCP Fast Open response (from [RFC
7413]: "The client MUST cache negative responses from the server in
order to avoid potential connection failures. Negative responses
include the server not acknowledging the data in the SYN, ICMP error
messages, and (most importantly) no response (SYN-ACK) from the
server at all, i.e., connection timeout."). TFOinfo is cached when a
connection is established.
Other TCP option state might not be as readily cached. E.g., TCP-AO
[RFC5925] success or failure between a host pair for a single SYN
destination port might be usefully cached. TCP-AO success or failure
to other SYN destination ports on that host pair is never useful to
cache because TCP-AO security parameters can vary per service.
The table below gives an overview of option-specific information
that is considered safe to share.
TEMPORAL SHARING - Option info
Cached New
----------------------------------------
old_TFO_Cookie old_TFO_Cookie
old_TFO_Failure old_TFO_Failure
Touch Expires July 12, 2017 [Page 6]
Internet-Draft TCP Control Block Interdependence January 2017
TEMPORAL SHARING - Cache Updates
Safe? Cached TCB Current TCB when? New Cached TCB
-----------------------------------------------------------------
yes old_MMS_S curr_ MMS_S OPEN curr MMS_S
yes old_MMS_R curr_ MMS_R OPEN curr_MMS_R
yes old_sendMSS curr_sendMSS MSSopt curr_sendMSS
yes old_PMTU curr_PMTU PMTUD curr_PMTU
TBD old_RTT curr_RTT CLOSE merge(curr,old)
TBD old_RTTvar curr_RTTvar CLOSE merge(curr,old)
varies old_option curr option ESTAB (depends on option)
TBD old_ssthresh curr_ssthresh CLOSE merge(curr,old)
TBD old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old)
Caching PMTU and sendMSS is trivial; reported values are cached, and
the most recent values are used. The cache is updated when the MSS
option is received in a SYN or after PMTUD (i.e., when an ICMPv4
Fraqmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is
received [RFC1981] or the equivalent is inferred, e.g. as from
PLPMTUD [RFC4821]), respectively, so the cache always has the most
recent values from any connection. For sendMSS, the cache is
consulted only at connection establishment and not otherwise
updated, which means that MSS options do not affect current
connections. The default sendMSS is never saved; only reported MSS
values update the cache, so an explicit override is required to
reduce the sendMSS. There is no particular benefit to caching MMS_S
and MMS_R as these are reported by the local IP stack.
TCP options are copied or merged depending on the details of each
option. E.g., TFO state is updated when a connection is established
and read before establishing a new connection.
RTT values are updated by a more complicated mechanism
[RFC1644][Ja86]. Dynamic RTT estimation requires a sequence of RTT
measurements. As a result, the cached RTT (and its variance) is an
average of its previous value with the contents of the currently
active TCB for that host, when a TCB is closed. RTT values are
updated only when a connection is closed. The method for merging old
and current values needs to attempt to reduce the transient for new
Touch Expires July 12, 2017 [Page 7]
Internet-Draft TCP Control Block Interdependence January 2017
connections. [THESE MERGE FUNCTIONS NEED TO BE SPECIFIED,
considering e.g. [DM16] - TBD].
The updates for RTT, RTTvar and ssthresh rely on existing
information, i.e., old values. Should no such values exist, the
current values are cached instead.
TEMPORAL SHARING - Option info Updates
Cached Current when? New Cached
----------------------------------------------------------------
old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie
old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure
7. An Example of Ensemble Sharing
Sharing cached TCB data across concurrent connections requires
attention to the aggregate nature of some of the shared state. For
example, although MSS and RTT values can be shared by copying, it
may not be appropriate to copy congestion window or ssthresh
information (see section 8 for a discussion of congestion window or
ssthresh sharing).
ENSEMBLE SHARING - TCB Initialization
Safe? Cached TCB New TCB
-----------------------------------------
yes old_MMS_S old_MMS_S
yes old_MMS_R old_MMS_R
yes old_sendMSS old_sendMSS
yes old_PMTU old_PMTU
TBD old_RTT old_RTT
TBD old_RTTvar old_RTTvar
TBD old_option (option-specific)
Table entries indicate which are considered to be safe to share
across an ensemble. The other entries are discussed in section 8.
Touch Expires July 12, 2017 [Page 8]
Internet-Draft TCP Control Block Interdependence January 2017
The table below gives an overview of option-specific information
that is considered safe to share.
ENSEMBLE SHARING - Option info
Cached New
----------------------------------------
old_TFO_Cookie old_TFO_Cookie
old_TFO_Failure old_TFO_Failure
ENSEMBLE SHARING - Cache Updates
Safe? Cached TCB Current TCB when? New Cached TCB
--------------------------------------------------------------
yes old_MMS_S curr_MMS_S OPEN curr_MMS_S
yes old_MMS_R curr_MMS_R OPEN curr_MMS_R
yes old_sendMSS curr_sendMSS MSSopt curr_sendMSS
yes old_PMTU curr_PMTU PMTUD curr_PMTU
/PLPMTUD
TBD old_RTT curr_RTT update rtt_update(old,cur)
TBD old_RTTvar curr_RTTvar update rtt_update(old,cur)
varies old_option curr option (depends) (option specific)
For ensemble sharing, TCB information should be cached as early as
possible, sometimes before a connection is closed. Otherwise,
opening multiple concurrent connections may not result in TCB data
sharing if no connection closes before others open. The amount of
work involved in updating the aggregate average should be minimized,
but the resulting value should be equivalent to having all values
measured within a single connection. The function "rtt_update" in
the ensemble sharing table indicates this operation, which occurs
whenever the RTT would have been updated in the individual TCP
connection. As a result, the cache contains the shared RTT
variables, which no longer need to reside in the TCB [Ja86].
Congestion window size and ssthresh aggregation are more complicated
in the concurrent case. When there is an ensemble of connections, we
Touch Expires July 12, 2017 [Page 9]
Internet-Draft TCP Control Block Interdependence January 2017
need to decide how that ensemble would have shared these variables,
in order to derive initial values for new TCBs.
ENSEMBLE SHARING - Option info Updates
Cached Current when? New Cached
----------------------------------------------------------------
old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie
old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure
Any assumption of this sharing can be incorrect, including this one,
because identical endpoint address pairs may not share network
paths. In current implementations, new congestion windows are set at
an initial value of 4-10 segments [RFC3390][RFC6928], so that the
sum of the current windows is increased for any new connection. This
can have detrimental consequences where several connections share a
highly congested link.
There are several ways to initialize the congestion window in a new
TCB among an ensemble of current connections to a host, as shown
below. Current TCP implementations initialize it to four segments as
standard [rfc3390] and 10 segments experimentally [RFC6928] and
T/TCP hinted that it should be initialized to the old window size
[RFC1644]. In the former cases, the assumption is that new
connections should behave as conservatively as possible. In the
latter T/TCP case, no accommodation is made for concurrent aggregate
behavior.
In either case, the sum of window sizes can increase, rather than
remain constant. A different approach is to give each pending
connection its "fair share" of the available congestion window, and
let the connections balance from there. The assumption we make here
is that new connections are implicit requests for an equal share of
available link bandwidth, which should be granted at the expense of
current connections. [TBD - a new method for safe congestion sharing
will be described]
8. Compatibility Issues
For the congestion and current window information, the initial
values computed by TCB interdependence may not be consistent with
the long-term aggregate behavior of a set of concurrent connections
between the same endpoints. Under conventional TCP congestion
control, if a single existing connection has converged to a
congestion window of 40 segments, two newly joining concurrent
Touch Expires July 12, 2017 [Page 10]
Internet-Draft TCP Control Block Interdependence January 2017
connections assume initial windows of 10 segments [RFC6928], and the
current connection's window doesn't decrease to accommodate this
additional load and connections can mutually interfere. One example
of this is seen on low-bandwidth, high-delay links, where concurrent
connections supporting Web traffic can collide because their initial
windows were too large, even when set at one segment.
[TBD - this paragraph needs to be revised based on new
recommendations] Under TCB interdependence, all three connections
could change to use a congestion window of 12 (rounded down to an
even number from 13.33, i.e., 40/3). This would include both
increasing the initial window of the new connections (vs. current
recommendations [RFC6928]) and decreasing the congestion window of
the current connection (from 40 down to 12). This gives the new
connections a larger initial window than allowed by [RFC6928], but
maintains the aggregate. Depending on whether the previous
connections were in steady-state, this can result in more bursty
behavior, e.g., when previous connections are idle and new
connections commence with a large amount of available data to
transmit. Additionally, reducing the congestion window of an
existing connection needs to account for the number of packets that
are already in flight.
Because this proposal attempts to anticipate the aggregate steady-
state values of TCB state among a group or over time, it should
avoid the transient effects of new connections. In addition, because
it considers the ensemble and temporal properties of those
aggregates, it should also prevent the transients of short-lived or
multiple concurrent connections from adversely affecting the overall
network performance. There have been ongoing analysis and
experiments to validate these assumptions. For example, [Ph12]
recommends to only cache ssthresh for temporal sharing when flows
are long. Sharing ssthresh between short flows can deteriorate the
overall performance of individual connections[Ph12, Nd16], although
this may benefit overall network performance. [TBD - the details of
this issue need to be summarized and clarified herein].
[TBD - placeholder for corresponding RTT discussion]
Due to mechanisms like ECMP and LAG [RFC7424], TCP connections
sharing the same host-pair may not always share the same path. This
does not matter for host-specific information such as RWIN and TCP
option state, such as TFOinfo. When TCB information is shared across
different SYN destination ports, path-related information can be
incorrect; however, the impact of this error is potentially
diminished if (as discussed here) TCB sharing affects only the
transient event of a connection start or if TCB information is
Touch Expires July 12, 2017 [Page 11]
Internet-Draft TCP Control Block Interdependence January 2017
shared only within connections to the same SYN destination port. In
case of Temporal Sharing, TCB information could also become invalid
over time. Because this is similar to the case when a connection
becomes idle, mechanisms that address idle TCP connections (e.g.,
[RFC7661]) could also be applied to TCB cache management.
There may be additional considerations to the way in which TCB
interdependence rebalances congestion feedback among the current
connections, e.g., it may be appropriate to consider the impact of a
connection being in Fast Recovery [RFC5861] or some other similar
unusual feedback state, e.g., as inhibiting or affecting the
calculations described herein.
TCP is sometimes used in situations where packets of the same host-
pair always take the same path. Because ECMP and LAG examine TCP
port numbers, they may not be supported when TCP segments are
encapsulated, encrypted, or altered - for example, some Virtual
Private Networks (VPNs) are known to use proprietary UDP
encapsulation methods. Similarly, they cannot operate when the TCP
header is encrypted, e.g., when using IPsec ESP. TCB interdependence
among the entire set sharing the same endpoint IP addresses should
work without problems under these circumstances. Moreover, measures
to increase the probability that connections use the same path could
be applied: e.g., the connections could be given the same IPv6 flow
label. TCB interdependence can also be extended to sets of host IP
address pairs that share the same network path conditions, such as
when a group of addresses is on the same LAN (see Section 9).
It can be wrong to share TCB information between TCP connections on
the same host as identified by the IP address if an IP address is
assigned to a new host (e.g., IP address spinning, as is used by
ISPs to inhibit running servers). It can be wrong if Network Address
(and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing
mechanism is used. Such mechanisms are less likely to be used with
IPv6. Other methods to identify a host could also be considered to
make correct TCB sharing more likely. Moreover, some TCB information
is about dominant path properties rather than the specific host. IP
addresses may differ, yet the relevant part of the path may be the
same.
9. Implications
There are several implications to incorporating TCB interdependence
in TCP implementations. First, it may reduce the need for
application-layer multiplexing for performance enhancement
[RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection
reestablishment costs by serializing or multiplexing a set of per-
Touch Expires July 12, 2017 [Page 12]
Internet-Draft TCP Control Block Interdependence January 2017
host connections across a single TCP connection. This avoids TCP's
per-connection OPEN handshake and also avoids recomputing MSS, RTT,
and congestion windows. By avoiding the so-called, "slow-start
restart," performance can be optimized. TCB interdependece can
provide the "slow-start restart avoidance" of multiplexing, without
requiring a multiplexing mechanism at the application layer.
TCB interdependence pushes some of the TCP implementation from the
traditional transport layer (in the ISO model), to the network
layer. This acknowledges that some state is in fact per-host-pair or
can be per-path as indicated solely by that host-pair. Transport
protocols typically manage per-application-pair associations (per
stream), and network protocols manage per-host-pair and path
associations (routing). Round-trip time, MSS, and congestion
information could be more appropriately handled in a network-layer
fashion, aggregated among concurrent connections, and shared across
connection instances [RFC3124].
An earlier version of RTT sharing suggested implementing RTT state
at the IP layer, rather than at the TCP layer [Ja86]. Our
observations are for sharing state among TCP connections, which
avoids some of the difficulties in an IP-layer solution. One such
problem is determining the associated prior outgoing packet for an
incoming packet, to infer RTT from the exchange. Because RTTs are
still determined inside the TCP layer, this is simpler than at the
IP layer. This is a case where information should be computed at the
transport layer, but could be shared at the network layer.
Per-host-pair associations are not the limit of these techniques. It
is possible that TCBs could be similarly shared between hosts on a
subnet or within a cluster, because the predominant path can be
subnet-subnet, rather than host-host. Additionally, TCB
interdependence can be applied to any protocol with congestion
state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for
individual subflows in Multipath TCP [RFC6824].
There may be other information that can be shared between concurrent
connections. For example, knowing that another connection has just
tried to expand its window size and failed, a connection may not
attempt to do the same for some period. The idea is that existing
TCP implementations infer the behavior of all competing connections,
including those within the same host or subnet. One possible
optimization is to make that implicit feedback explicit, via
extended information associated with the endpoint IP address and its
TCP implementation, rather than per-connection state in the TCB.
Touch Expires July 12, 2017 [Page 13]
Internet-Draft TCP Control Block Interdependence January 2017
Like its initial version in 1997, this document's approach to TCB
interdependence focuses on sharing a set of TCBs by updating the TCB
state to reduce the impact of transients when connections begin or
end. Other mechanisms have since been proposed to continuously share
information between all ongoing communication (including
connectionless protocols), updating the congestion state during any
congestion-related event (e.g., timeout, loss confirmation, etc.)
[RFC3124]. By dealing exclusively with transients, TCB
interdependence is more likely to exhibit the same behavior as
unmodified, independent TCP connections.
10. Implementation Observations
The observation that some TCB state is host-pair specific rather
than application-pair dependent is not new and is a common
engineering decision in layered protocol implementations. A
discussion of sharing RTT information among protocols layered over
IP, including UDP and TCP, occurred in [Ja86]. Although now
deprecated, T/TCP was the first to propose using caches in order to
maintain TCB states (see Appendix A for more information).
The table below describes the current implementation status for some
TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows
(as of October 2016). In the table, "shared" only refers to temporal
sharing.
Touch Expires July 12, 2017 [Page 14]
Internet-Draft TCP Control Block Interdependence January 2017
TCB data Status
-----------------------------------------------------------
old MMS_S Not shared
old MMS_R Not shared
old_sendMSS Cached and shared in Linux (MSS)
old PMTU Cached and shared in FreeBSD and Windows (PMTU)
old_RTT Cached and shared in FreeBSD and Linux
old_RTTvar Cached and shared in FreeBSD
old TFOinfo Cached and shared in Linux and Windows
old_snd_cwnd Not shared
old_ssthresh Cached and shared in FreeBSD and Linux:
FreeBSD: arithmetic
mean of ssthresh and previous value if
a previous value exists;
Linux: depending on state,
max(cwnd/2, ssthresh) in most cases
11. Security Considerations
These suggested implementation enhancements do not have additional
ramifications for explicit attacks. These enhancements may be
susceptible to denial-of-service attacks if not otherwise secured.
For example, an application can open a connection and set its window
size to zero, denying service to any other subsequent connection
between those hosts.
TCB sharing may be susceptible to denial-of-service attacks,
wherever the TCB is shared, between connections in a single host, or
between hosts if TCB sharing is implemented within a subnet (see
Implications section). Some shared TCB parameters are used only to
create new TCBs, others are shared among the TCBs of ongoing
connections. New connections can join the ongoing set, e.g., to
optimize send window size among a set of connections to the same
host.
Attacks on parameters used only for initialization affect only the
transient performance of a TCP connection. For short connections,
the performance ramification can approach that of a denial-of-
Touch Expires July 12, 2017 [Page 15]
Internet-Draft TCP Control Block Interdependence January 2017
service attack. E.g., if an application changes its TCB to have a
false and small window size, subsequent connections would experience
performance degradation until their window grew appropriately.
The solution is to limit the effect of compromised TCB values. TCBs
are compromised when they are modified directly by an application or
transmitted between hosts via unauthenticated means (e.g., by using
a dirty flag). TCBs that are not compromised by application
modification do not have any unique security ramifications. Note
that the proposed parameters for TCB sharing are not currently
modifiable by an application.
All shared TCBs MUST be validated against default minimum parameters
before used for new connections. This validation would not impact
performance, because it occurs only at TCB initialization. This
limits the effect of attacks on new connections to reducing the
benefit of TCB sharing, resulting in the current default TCP
performance. For ongoing connections, the effect of incoming packets
on shared information should be both limited and validated against
constraints before use. This is a beneficial precaution for existing
TCP implementations as well.
TCBs modified by an application SHOULD NOT be shared, unless the new
connection sharing the compromised information has been given
explicit permission to use such information by the connection API.
No mechanism for that indication currently exists, but it could be
supported by an augmented API. This sharing restriction SHOULD be
implemented in both the host and the subnet. Sharing on a subnet
SHOULD utilize authentication to prevent undetected tampering of
shared TCB parameters. These restrictions limit the security impact
of modified TCBs both for connection initialization and for ongoing
connections.
Finally, shared values MUST be limited to performance factors only.
Other information, such as TCP sequence numbers, when shared, are
already known to compromise security.
12. IANA Considerations
There are no IANA implications or requests in this document.
This section should be removed upon final publication as an RFC.
Touch Expires July 12, 2017 [Page 16]
Internet-Draft TCP Control Block Interdependence January 2017
13. References
13.1. Normative References
[RFC793] Postel, Jon, "Transmission Control Protocol," Network
Working Group RFC-793/STD-7, ISI, Sept. 1981.
[RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191,
Nov. 1990.
[RFC1981] McCann, J., Deering. S., Mogul, J., "Path MTU Discovery
for IP version 6," RFC 1981, Aug. 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU
Discovery," RFC 4821, Mar. 2007.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast
Open", RFC 7413, Dec. 2014.
13.2. Informative References
[Br02] Brownlee, N. and K. Claffy, "Understanding Internet
Traffic Streams: Dragonflies and Tortoises", IEEE
Communications Magazine p110-117, 2002.
[Be94] Berners-Lee, T., et al., "The World-Wide Web,"
Communications of the ACM, V37, Aug. 1994, pp. 76-82.
[Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for
Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994.
[Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2,
Prentice-Hall, NJ, 1991.
[FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/
[Ja86] Jacobson, V., (mail to public list "tcp-ip", no archive
found), 1986.
[Nd16] Dukkipati, N., Yuchung C., and Amin V., "Research
Impacting the Practice of Congestion Control." ACM SIGCOMM
CCR (editorial).
Touch Expires July 12, 2017 [Page 17]
Internet-Draft TCP Control Block Interdependence January 2017
[DM16] Matz, D., "Optimize TCP's Minimum Retransmission Timeout
for Low Latency Environments", Master's thesis, Technical
University Munich, 2016.
[Ph12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for
short TCP flows," 2012 IEEE International Conference on
Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213.
[RFC1122] Braden, R. (ed), "Requirements for Internet Hosts --
Communication Layers", RFC-1122, Oct. 1989.
[RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions
Functional Specification," RFC-1644, July 1994.
[RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379,
September 1992.
[RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address
Translator (NAT) Terminology and Considerations", RFC-
2663, August 1999.
[RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's
Initial Window," RFC 3390, Oct. 2002.
[RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and
Content," RFC-7231, June 2014.
[RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager,"
RFC 3124, June 2001.
[RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion
Control Protocol (DCCP)," RFC 4340, Mar. 2006.
[RFC4960] Stewart, R., (Ed.), "Stream Control Transmission
Protocol," RFC4960, Sept. 2007.
[RFC5861] Allman, M., Paxson, V., Blanton, E., "TCP Congestion
Control," RFC 5861, Sept. 2009.
[RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication
Option," RFC 5925, June 2010.
[RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP
Extensions for Multipath Operation with Multiple
Addresses," RFC 6824, Jan. 2013.
Touch Expires July 12, 2017 [Page 18]
Internet-Draft TCP Control Block Interdependence January 2017
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing
TCP's Initial Window," RFC 6928, Apr. 2013.
[RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish,
B., "Mechanisms for Optimizing Link Aggregation Group
(LAG) and Equal-Cost Multipath (ECMP) Component Link
Utilization in Networks", RFC 7424, Jan. 2015
[RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer
Protocol Version 2 (HTTP/2)", RFC 7540, May 2015.
[RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP
to Support Rate-Limited Traffic", RFC 7661, Oct. 2015
14. Acknowledgments
The authors would like to thank for Praveen Balasubramanian for
information regarding TCB sharing in Windows, and Yuchung Cheng,
Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on
earlier versions of the draft. This work has received funding from a
collaborative research project between the University of Oslo and
Huawei Technologies Co., Ltd., and is partly supported by USC/ISI's
Postel Center.
This document was prepared using 2-Word-v2.0.template.dot.
15. Change log
from -01 to -02:
- Stated that our OS implementation overview table only covers
temporal sharing.
- Correctly reflected sharing of old_RTT in Linux in the
implementation overview table.
- Marked entries that are considered safe to share with an
asterisk (suggestion was to split the table)
- Discussed correct host identification: NATs may make IP
addresses the wrong input, could e.g. use HTTP cookie.
- Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and
MTU
Touch Expires July 12, 2017 [Page 19]
Internet-Draft TCP Control Block Interdependence January 2017
- Added information about option sharing, listed options in the
appendix
Authors' Addresses
Joe Touch
USC/ISI
4676 Admiralty Way
Marina del Rey, CA 90292-6695
USA
Phone: +1 (310) 448-9151
Email: touch@isi.edu
Michael Welzl
University of Oslo
PO Box 1080 Blindern
Oslo N-0316
Norway
Phone: +47 22 85 24 20
Email: michawe@ifi.uio.no
Safiqul Islam
University of Oslo
PO Box 1080 Blindern
Oslo N-0316
Norway
Phone: +47 22 84 08 37
Email: safiquli@ifi.uio.no
Touch Expires July 12, 2017 [Page 20]
Internet-Draft TCP Control Block Interdependence January 2017
Jianjie You
Huawei
101 Software Avenue, Yuhua District
Nanjing 210012
China
Email: youjianjie@huawei.com
16. Appendix A: TCB sharing history
T/TCP proposed using caches to maintain TCB information across
instances (temporal sharing), e.g., smoothed RTT, RTT variance,
congestion avoidance threshold, and MSS [RFC1644]. These values were
in addition to connection counts used by T/TCP to accelerate data
delivery prior to the full three-way handshake during an OPEN. The
goal was to aggregate TCB components where they reflect one
association - that of the host-pair, rather than artificially
separating those components by connection.
At least one T/TCP implementation saved the MSS and aggregated the
RTT parameters across multiple connections, but omitted caching the
congestion window information [Br94], as originally specified in
[RFC1379]. Some T/TCP implementations immediately updated MSS when
the TCP MSS header option was received [Br94], although this was not
addressed specifically in the concepts or functional specification
[RFC1379][RFC1644]. In later T/TCP implementations, RTT values were
updated only after a CLOSE, which does not benefit concurrent
sessions.
Temporal sharing of cached TCB data was originally implemented in
the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same
[FreeBSD]. As mentioned before, only the MSS and RTT parameters were
cached, as originally specified in [RFC1379]. Later discussion of
T/TCP suggested including congestion control parameters in this
cache [RFC1644].
17. Appendix B: Options
In addition to the options that can be cached and shared, this memo
also lists all options for which state should *not* be kept. This
list is meant to avoid work duplication and should be removed upon
publication.
Touch Expires July 12, 2017 [Page 21]
Internet-Draft TCP Control Block Interdependence January 2017
Obsolete (MUST NOT keep state):
ECHO
ECHO REPLY
PO Conn permitted
PO service profile
CC
CC.NEW
CC.ECHO
Alt CS req
Alt CS data
No state to keep:
EOL
NOP
WS
SACK
TS
MD5
TCP-AO
EXP1
EXP2
MUST NOT keep state:
Touch Expires July 12, 2017 [Page 22]
Internet-Draft TCP Control Block Interdependence January 2017
Skeeter (DH exchange - might be obsolete, though)
Bubba (DH exchange - might really be obsolete, though)
Trailer CS
SCPS capabilities
S-NACK
Records boundaries
Corruption experienced
SNAP
TCP Compression
Quickstart response
UTO
MPTCP (can we cache when this fails?)
TFO success
MAY keep state:
MSS
TFO failure (so we don't try again, since it's optional)
MUST keep state:
TFP cookie (if TFO succeeded in the past)
Touch Expires July 12, 2017 [Page 23]