Internet Engineering Task Force M. Scharf
Internet-Draft Alcatel-Lucent Bell Labs
Intended status: Experimental July 12, 2010
Expires: January 13, 2011
Multi-Connection TCP (MCTCP) Transport
draft-scharf-mptcp-mctcp-01
Abstract
Multipath transport over potentially different paths can be realized
by several coupled Transmission Control Protocol (TCP) connections.
Multi-Connection TCP (MCTCP) transport aggregates multiple TCP
connections between potentially different addresses into a single
session that can be accessed by an application like a single TCP
connection. MCTCP encodes control information, as far as possible,
in the payload of the TCP connections and therefore requires only
minor changes in the TCP implementations, and it is transparent in
the single-path case. MCTCP is therefore proposed as a simple,
modular, and extensible mechanism for multipath transport.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 13, 2011.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Scharf Expires January 13, 2011 [Page 1]
Internet-Draft Multi-Connection TCP July 2010
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Design Considerations . . . . . . . . . . . . . . . . . . . . 4
3.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2. Operation Summary . . . . . . . . . . . . . . . . . . . . 5
3.3. Differences to Other Multipath Transport Solutions . . . . 9
4. TCP Extensions by MCTCP . . . . . . . . . . . . . . . . . . . 14
4.1. Setup of the Initial Connection . . . . . . . . . . . . . 14
4.2. Setup of Coupled Connection . . . . . . . . . . . . . . . 15
4.3. Usage of Coupled Connections . . . . . . . . . . . . . . . 17
4.4. Operation Mode Switch . . . . . . . . . . . . . . . . . . 18
5. MCTCP Session Protocol Messages . . . . . . . . . . . . . . . 19
5.1. Data Segmentation and Encoding . . . . . . . . . . . . . . 19
5.2. Retransmission Requests . . . . . . . . . . . . . . . . . 21
5.3. Address Advertisement . . . . . . . . . . . . . . . . . . 22
5.4. Connection Management and Fallback . . . . . . . . . . . . 24
6. MCTCP Session Policies and Algorithms . . . . . . . . . . . . 25
6.1. Message Scheduling . . . . . . . . . . . . . . . . . . . . 25
6.2. Congestion and Flow Control . . . . . . . . . . . . . . . 25
7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.1. Interface between MCTCP and TCP . . . . . . . . . . . . . 26
7.2. Interface to Applications . . . . . . . . . . . . . . . . 27
8. Interaction with Middleboxes . . . . . . . . . . . . . . . . . 27
8.1. Middleboxes that Manipulate TCP Options . . . . . . . . . 27
8.2. Middleboxes that Change Content . . . . . . . . . . . . . 28
8.3. Middleboxes that Translate Addresses/Ports . . . . . . . . 29
8.4. Middleboxes that Want to Control MCTCP Traffic . . . . . . 30
8.5. Middleboxes that Proactively Acknowledge Data . . . . . . 30
9. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 31
10. Security Considerations . . . . . . . . . . . . . . . . . . . 31
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
12. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 32
13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33
14.1. Normative References . . . . . . . . . . . . . . . . . . . 33
14.2. Informative References . . . . . . . . . . . . . . . . . . 33
Appendix A. Possible Future MCTCP Extension . . . . . . . . . . . 33
Appendix B. Change History of the Document . . . . . . . . . . . 35
Scharf Expires January 13, 2011 [Page 2]
Internet-Draft Multi-Connection TCP July 2010
1. Introduction
The objective of Multipath TCP is to enable multipath transport over
multiple paths like a regular TCP connection [1]. The motivation for
using multiple paths, as well as design considerations are discussed
in [7].
One key question concerning the Multipath TCP protocol design is how
to transport the control information, which is required for the setup
and the teardown of different sub-flows, as well as for the
segmentation and reassembly of the byte stream in the sender and
receiver, respectively. One possibility is to encode this signaling
information in several new TCP options [8].
This document describes Multi-Connection TCP (MCTCP) transport.
MCTCP is an alternative solution that transports both application and
control data with an own framing mechanism in the payload of parallel
TCP connections, but only if multipath transport is really needed.
MCTCP is simpler and more modular while providing almost the same
service like a Multipath TCP protocol with option signaling.
To applications, MCTCP offers the same reliable, in-order, byte-
stream transport as TCP. It is designed to be backward-compatible
with both applications and the network layer. Applications can use
MCTCP exactly like a single TCP connection, as described in [11]. As
long as multiple paths are not used, an MCTCP transfer is identical
to a standard TCP transfer, except for a new TCP option in SYN
segments that detects MCTCP support in the remote end. Once multi-
connection transfer is enabled, data chunks are sent over several TCP
connections with a new type-length-value (TLV) framing format. This
framing also permits the exchange of arbitrary amounts of control
information between the endpoints of the MCTCP session. The multiple
TCP connections operate independently, but the MCTCP session
coordinates the congestion control states. MCTCP can therefore use a
coupled congestion control (e. g., [10]) that does not harm other
network users.
2. Terminology
This document uses a terminology that slighly differs to [8]:
Path: A sequence of links between a sender and a receiver, defined
in this context by a source and destination address pair.
Initial connection: The first TCP connection between the two
endpoints of the MCTCP session.
Scharf Expires January 13, 2011 [Page 3]
Internet-Draft Multi-Connection TCP July 2010
Coupled connection: A coupled connection is a follow-up TCP
connection that is part of the session. It roughly corresponds to
a "subflow" in [8].
Session: A collection of the initial connection and, if in use,
one or more coupled TCP connections. The applications at the two
endpoints of the session can communicate as if there was a single
TCP connection only. For an application, there is a one-to-one
mapping between a session and the socket. If a session includes
only the initial connection, it is almost identical to a standard
TCP connection, except for a new TCP option in the SYN segments.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [3].
3. Design Considerations
This section gives a high-level, non-compulsory overview of MCTCP's
design and its usage.
3.1. Objectives
With multipath transport, applications should be able to use the
aggregated bandwith of several paths without coping about details of
data transport, path management, scheduling, and congestion control.
This can improve both performance and resilience compared to the
current data transport that is mostly limited to a single path.
Yet, a multipath transport solution that requires multiple addresses
at least on one side will only be useful under certain constraints:
First, it requires endsystems with more than one address. One
example are mobile devices with several radio interfaces, which are
increasingly common. But even in that case it can make sense to use
one interface only, for instance in order to save battery energy.
Second, due to the signaling overhead and the effort of negotiation,
a multipath transport mechanism is mainly useful for long bulk data
file transfers. In the Internet, this use case only represents a
small subset of TCP's usage scenarios.
Given this rather specific use case, this document argues that a
multipath transport mechanism should neither require complex
modifications of the TCP stack, nor fundamentally change the TCP data
transmission as seen by middleboxes on the path, at least as long
only a single path is in use. Obviously, once multipath transport is
enabled, any middlebox performing deep packet inspection may get
confused as it will only see that part of the byte stream that is
transported over the corresponding path. As a consequence, on can
Scharf Expires January 13, 2011 [Page 4]
Internet-Draft Multi-Connection TCP July 2010
use a different framing format in that case. Furthermore, rapid
deployment of a multipath solution would also significantly benefit
from the possibility to implement it in the user space, as far as
possible.
Multi-Connection TCP (MCTCP) transport is designed to be a simple,
modular, extensible, and non-disruptive multipath transport
mechanism. Key design objectives are:
o Backward-compatibility: MCTCP is designed to be entirely backward-
compatible with a single TCP connection and falls back to standard
TCP if it is not supported by both endsystems, or if the setup of
additional coupled connections fails.
o Few TCP options only: MCTCP only requires new, short TCP options
in SYN segments, at least for the basic operation. As a result,
middleboxes that strip, duplicate, or modify TCP options, drop
such packets, or reassemble the byte stream cannot affect the
integrity of the data transport.
o Identical byte stream: MCTCP's byte stream is identical to a TCP
connection until multipath usage gets negotiated, except for the
new TCP option in the SYN. As a fallback, it is in principle even
possible to seamlessly continue the transport of the whole
application data over the initial TCP connection, if multipath
transport fails (e. g., due to middleboxes).
o Simplicity: MCTCP tries to minimize the changes required inside
existing network stacks. Except for few pretty straightforward
addons, a coupled TCP connection is setup, maintained, and closed
like a standard TCP connection. The major functions of MCTCP can
be implemented in the user-space.
o Same API: MCTCP can provide the same API to applications like the
existing TCP.
o Multi-address assumption: MCTCP assumes that one or both endpoints
of an MCTCP session are multihomed and multiaddressed.
These objectives are achieved by defining two different operation
modes of MCTCP, the single-connection and the multi-connection mode.
3.2. Operation Summary
In single-connection mode, an MCTCP session is equivalent to a single
TCP connection. The required minimum of control information is
exchanged by TCP options. When multipath transfer shall be enabled,
MCTCP switches to the multi-connection mode, in which it opens
Scharf Expires January 13, 2011 [Page 5]
Internet-Draft Multi-Connection TCP July 2010
additional, coupled TCP connections from or to possibly different
addresses of the same endsystems. Initial and coupled connection are
linked by two tokens in each session endpoint, which are exchanged
during the setup of the initial connection.
Each coupled TCP connection can transport control information and
data chunks in messages that are encoded in a type-length-value (TLV)
framing format. In multi-connection mode, the MCTCP transport on one
of the coupled TCP connections is similar to the Transport Layer
Security (TLS) protocol [5], except that data is not encrypted but
partitioned over different connections. TLS can be used on top of
MCTCP without requiring any adaptation.
In summary, in single-connection mode MCTCP is transparent, while in
multi-connection mode it acts as a shim layer between several coupled
TCP connections and the upper protocol layers, with a payload
encoding similar like TLS. An MCTCP session can also fall back to
single-connection mode a mean to further increase MCTCP's robustness
when facing problems with certain types of middleboxes.
+-------------------------------+
| Application |
+-------------------------------+
^^^^
|||| Byte stream (e. g., socket interface)
VVVV
+-------------------------------+
| MCTCP session layer |
+-------------------------------+
^^ ^ ^ ^^
Chunked || : Connection & : || Chunked
data || : cong. control : || data
VV V V VV
+---------------+---------------+
| TCP connection| TCP connection|
+-------------------------------+
| IP | IP |
+-------------------------------+
Figure 1: MCTCP in the protocol stack
Figure 1 shows the position of MCTCP in the protocol stack, as a shim
layer between (coupled) TCP connections and upper-layer protocols or
applications. For MCTCP's connection management and the coupled
congestion control, the MCTCP session layer requires an additional
interface to each TCP connection, as well as some simple changes in
the TCP stack, e. g., to set the new TCP option in SYN segments.
Both modifications are straightforward and only affect a small subset
Scharf Expires January 13, 2011 [Page 6]
Internet-Draft Multi-Connection TCP July 2010
of TCP's function.
The MCTCP session layer can be implemented in the kernel space as an
extension of the socket interface processing. Alternatively, the
connection management, data segmentation/reassembly, and congestion
control coupling can be realized in the user space, in combination
with some small modifications of TCP. As an example, MCTCP could be
implemented as an extension of the library that offers the socket
interface to applications. In both cases the MCTCP session layer can
be completely transparent to applications, i. e., they can continue
to use the existing socket interface to TCP [11].
In the following, a high-level summary of normal operation of MCTCP
is provided, for the scenario shown in Figure 2:
o To a non-MCTCP-aware application, MCTCP will be transparent and
indistinguishable from normal TCP. All MCTCP operation is handled
by the MCTCP implementation, although extended APIs could provide
additional control and influence [11]. An application begins by
opening a TCP socket in the normal way.
o An MCTCP session begins in single-connection mode with a single
TCP connection ("initial connection"). This is illustrated in
Figure 2 between Addresses A1 and B1 on Hosts A and B,
respectively.
o MCTCP uses an "Multipath Capable" TCP option in the SYN segments
to determine whether both endsystems support MCTCP. If the option
is not echoed in the SYN/ACK, the connection initiator knows that
the destination is not MCTCP-capable. If the SYN segment has to
be retransmitted, the connection initiator will not set the
"Multipath Capable" TCP option again, in order to circumvent
problems with middleboxes that cannot deal with unknown TCP
options. In that case, multipath transport cannot be used to that
destination.
o MCTCP does not exchange much signaling information in single-
connection mode, as this would require further TCP options outside
SYN segments. The only exception is the non-mandatory "Mode" TCP
option, which can be set by one endpoint in order to signal to the
other endpoint that it shall switch to multi-connection mode by
establishing a coupled connection to the same destination IP
address, over which additional information can then be exchanged.
If this TCP option is removed on the path, MCTCP may not be able
to enable multipath transport in some usage scenarios (e. g.,
behind NAPTs), but the single-connection transport will continue
without being impacted.
Scharf Expires January 13, 2011 [Page 7]
Internet-Draft Multi-Connection TCP July 2010
o If additional addresses are available, and if they shall be used,
MCTCP switches to the multi-connection mode.
o When entering multi-connection mode, the MCTCP session endpoints
establish one or more coupled TCP connections. The first coupled
connection should use the same IP source and destination address
like the initial connection, in order to establish a control
channel over which more information can be exchanged. Each
coupled connection is added to the MCTCP session.
o MCTCP identifies multiple paths by the presence of multiple
addresses at endpoints, and it can establish coupled connections
between combinations of these multiple addresses. In the example
shown in Figure 2, coupled connections are set up between A1 and
B1, and between A2 and B1.
o The discovery and setup of additional coupled TCP connections will
be achieved through a path management method described later in
this document.
o The coupled connection use TLV-encoded messages and can thus
transport both control messages and data chunks. The data chunks
include a session-level sequence number to allow the in-order
reassembly of the data chunks from multiple coupled connections at
the receiver.
Scharf Expires January 13, 2011 [Page 8]
Internet-Draft Multi-Connection TCP July 2010
Host A Host B
------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ----------
| | | |
| "Initial connection" setup | | ^
|--------------SYN+MPCAP------------>| | |
|(incl. Multipath Capable TCP option)| | | Single-
| | | | | conn.
|<----------SYN/ACK+MPCAP------------| | | mode
| | | | |
|#####Byte stream data transfer######| | V
| | | |
~ ~ ~ ~
| | | |
| "Coupled connections" setup | |
|--------------SYN+JOIN------------->| |
|<-----------SYN/ACK+JOIN------------| | ^
| | | | |
| |------SYN+JOIN------->| | | Multi-
| |<----SYN/ACK+JOIN-----| | | conn.
| | | | | mode
|##########TLV data transfer#########| | |
| | | | |
| |##TLV data transfer###| | V
| | | |
Figure 2: MCTCP usage scenario
For simplicity reasons, MCTCP does not send further data over the
initial connection after it has triggered the transition to multi-
connection mode. As a consequence, the initial connection will be
unused in multi-connection mode. This document mandates to keep the
connection open as long as other coupled connections exist. This
design choice is motivated later in this document.
3.3. Differences to Other Multipath Transport Solutions
MCTCP follows the design principles outlined in [7], but it differs
to the protocol design described in [8], which uses TCP options to
transport all control information. In the following, the key
advantages of MCTCP are summarized:
o MCTCP does not rely on frequently sent TCP options, in particular
not on options that may have to be present in many packets. In
the simplest case, it only requires two new types of TCP options
which are set in SYN segments only. The required options are
short and do not consume much of the TCP option space, which is
Scharf Expires January 13, 2011 [Page 9]
Internet-Draft Multi-Connection TCP July 2010
already scarce in SYNs. It should also be noted that the
selective acknowledgment (SACK) option [2] is currently the only
major TCP option that is sporadically set after connection setup.
Yet, SACK options are only present after packet losses or
reordering events, which are seldom, and they are often set in
segments without payload. Adding sporadically other new TCP
options to all kinds of segments may increase the complexity of
the TCP sender, since the MSS must be adapted correspondingly. As
a consequence, MCTCP may also be simpler to realize in combination
with TCP segmentation offload on network cards.
o MCTCP's operation is much more robust in combination with
middleboxes that strip, duplicate, or modify TCP options and/or
drop packets with unknown TCP options. The worst case is that
multipath transport will not be enabled on a path with such
middleboxes, but the data stream's integrity will not be affected.
In general, the transport of information in TCP options outside
SYNs is not necessarily reliable, unless an acknowledgement and
retransmission mechanism for that information exists. As a
consequence, TCP options are not well suited for transport of
information that is absolutely essential for the data integrity.
It is also impossible to savely detect whether novel TCP options
can indeed be exchanged between two hosts in the Internet, as the
routing may change and additional middleboxes may appear on the
paths, e. g., in mobile networks. Therefore, a signaling method
that transports essential control information such as sequence
numbers in TCP options is not robust in such environments.
Obviously, it cannot efficiently use multiple paths if a middlebox
blocks TCP options, as there is no way to reliably exchange
control information in options. There are also situations where
multipath transport with option encoding cannot even fall back to
single-path transport, e. g., if routing changes and afterwards
TCP options cannot be exchanged on all used paths. Unlike MCTCP,
multipath transport with option encoding would break and not be
able to complete ongoing data transfers in such cases, except if
it used an MCTCP-like approach as well.
o MCTCP is also rather robust when middleboxes rewrite content, as
it can use a checksum to savely detect content modifications in
one or several connections. It could even define schemes that
transfer such content in a different content encoding format.
o MCTCP offers a simple mechanism by which a middlebox can prevent
to transport any multi-connection traffic: It can simply drop SYN
segments with the "JOIN" TCP option. In that case, unless routing
changes, paths through that middlebox will not be used in multi-
connection mode. If that middlebox is on the path of the initial
connection, it will always see the whole, unmodified byte stream.
Scharf Expires January 13, 2011 [Page 10]
Internet-Draft Multi-Connection TCP July 2010
This middlebox-friendly design is an advantage of the distinction
between initial and coupled connections. It could also help to
comply with certain network policies such as lawful interception.
o The TCP option space is limited to 40 byte. In multi-connection
mode, MCTCP can exchange any amount of information between the
endsystems. As such, it is more extensible and flexible. For
instance, without length limitation, one can easily exchange a
list of several IPv6 candidate addresses in the payload of a
single TCP sgement. It would also be possible to announce lists
of candidate port numbers or even to exchange address information
in form of a Uniform Resource Identifer (URI) or any other
referral object structure. Finally, MCTCP could use strong
protection mechanisms between coupled connections to ensure that
they have indeed the same endpoint, such as longer tokens.
o The design is modular, as the operation of a single TCP connection
is almost independent from the multipath transport, except for the
necessary coupling of congestion control. For instance, there is
no need to modify the SACK scoreboards implementation in existing
TCP implementations, and synchronization issues between different
TCP connections are avoided.
o MCTCP has a reasonable deployment roadmap. Most functions of
MCTCP can be realized in the user space with a small patch of the
TCP implementation only. The required extensions inside the
network stack are simple, straightforward, and non-disruptive.
This means that MCTCP can initially be deployed mostly as a user
space solution, without lacking any features. As a second step,
once the protocol is widely supported in the Internet, it could
become an integral part of the network stack.
o The transport of control information in the payload is reliable
and congestion-controlled. TLV-encoded messaging is
straightforward and well-known, e. g., from TLS. MCTCP does not
use a mandatory positive acknowledgement mechanism and therefore
does not require frequent additional data transport in the reverse
direction.
o MCTCP can be extended in future, for instance to use a stronger
protection for the coupling of connections, possibly even by
exchange of cryptographic keys, if needed. A list of possible
future extensions is provided in the appendix.
MCTCP shares a number of properties of [8]. It can use a coupled
congestion control in a similar way, and it is able to enable
multipath transport under the same constraints.
Scharf Expires January 13, 2011 [Page 11]
Internet-Draft Multi-Connection TCP July 2010
Still, it must be noted that there are a number of potential
drawbacks of MCTCP's design as well:
o MCTCP is designed for the use case of a bulk data transfer that
starts as a single path transfer that is later "upgraded" in order
to use multiple interfaces. This is the most obvious use case of
multipath transfer, as transporting smaller amounts of data over
multiple paths would result in a significant overhead. In
contrast, MCTCP is less efficient if the multipath transfer shall
be used right from the beginning of a transfer, due the backward-
compatible design of MCTCP's single-connection mode that results
in a very limited control. If this use case was important, an
MCTCP variant with payload encoding in the initial connection
could be developed, too. Its design is straightforward, but left
for further study, as it would only be of use in certain
scenarios.
o MCTCP opens an additional TCP connection when switching to multi-
connection mode, and it does not continue using the initial
connection. The connection setup of the coupled connections
results in a small delay, i. e., the path may not be completely
utilized during a short time. An obvious optimization would be to
transfer the congestion control state from the initial connection
to the first coupled connection, in order to avoid the TCP Slow-
Start there. Both connections should use the same path. It must
be noted that not using the initial connection after the switch-
over to the multi-connection mode is the simplest solution;
alternative solutions are possible. Furthermore, the "handover"
process and the resulting delay could be minimized by further
optimization, but this is left for further study.
o MCTCP session endpoints do not exchange address information before
entering the multi-connection mode, even if this would be possible
by additional TCP options [8]. Both endsystems can initiate a
change of operation mode, and address information can be exchanged
by the MCTCP session protocol once this is successful. If the
"Mode" TCP option is supported, an endpoint can even trigger the
setup of a coupled connection by the other endpoint, e. g., if
that host is located behind a NAPT. Yet, while being in single-
connection mode, MCTCP provides no means to learn other addresses.
As a consequence, endsystems may try to enter the multi-connection
mode in vain, if they assume that their peer is multi-homed. If
that peer is not multi-homed, it can either agree to switch to
multi-connection mode, or deny that (by not responding with a
"Join" option). In the former case, an additional TCP connection
is needlessly established between both peers, and in the latter
case data transfer could briefly slow down until MCTCP falls back
to single-connection mode. For long-lived connections that
Scharf Expires January 13, 2011 [Page 12]
Internet-Draft Multi-Connection TCP July 2010
benefit most from multi-connection mode both cases hardly cause
much harm.
o Given that MCTCP transports control information in the payload, it
is more complex for middleboxes to parse and potentially modify
MCTCP's control information. In order to do so, a middlebox has
to perform deep packet inspection and reassemble the messages of
the coupled TCP connection(s). This may prevent certain
operations and optimizations by middleboxes. However, it should
be noted that middleboxes cannot affect the payload in other
related protocols such as TLS neither, i. e., MCTCP is somehow
similar to TLS in that sense. Of course, middleboxes can still
perform certain forms of traffic engineering for an individual
coupled connection, such as randomizing initial sequence numbers
or modifying the advertized receive window (which may, of course,
do harm to any end-to-end connection). A middlebox that wants to
prevent MCTCP usage can simply and savely drop packets with the
TCP "Join" option and will then not be passed by any multi-
connection traffic, except if routing changes.
o If MCTCP detects that one coupled connection stalls, it can
retransmit data over another connection, which can reduce the
delivery time and prevent head-of-line blocking. However, if
MCTCP is partly realized in the user space, it might not be able
to retransmit a lost segment immediately over another coupled
connection, given that this would require complex changes of the
segmentation and SACK scoreboard implementation in each coupled
connection. As a result, if congestion occurs on a subset of the
coupled connections, the end-to-end delivery delay of a user-space
solution may be larger than the delay of a protocol that is
tightly integrated into the protocol stack. In general, an
implementation inside the protocol stack can assign data more
flexibly and more dynamically to the different interfaces. This
would be an advantage of a kernel-space implementation. Yet, a
reasonable MCTCP session layer scheduling can reduce the risk of
head-of-line blocking by simply avoiding long send buffer queues,
even if it is realized in the user space.
o MCTCP as defined in this document does not provide some signaling
mechanisms of [8], such as the "DATA FIN". While it is obviously
possible to add these mechanisms as well, it will result in a more
complex protocol design and is therefore not addressed in this
version of the protocol specification.
Scharf Expires January 13, 2011 [Page 13]
Internet-Draft Multi-Connection TCP July 2010
4. TCP Extensions by MCTCP
This section describes the modifications in the TCP protocol that are
required by MCTCP. MCTCP only defines additional TCP options.
Several TCP options and mechanisms are similar to [8], but differ in
details. Later, Section 7.1 describes to what information inside the
TCP stack an MCTCP session must have access to.
4.1. Setup of the Initial Connection
The initial connection of an MCTCP session is setup like a TCP
connection with a three-way handshake. A connection initiator that
wants to announce its MCTCP capability sets the "Multipath Capable"
TCP option in the SYN, as shown in Figure 3. This option only
declares that its sender is capable of using MCTCP, even if will not
be enabled for that session. It includes a field that presents a
locally-unique token identifying this connection. The two tokens
will be used when adding additional coupled connections to verify
that the endpoint is identical.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+
| Kind=OPT_MPCAP| Length=6 | Sender Token :
+---------------+---------------+-------------------------------+
: Sender Token (contd.) |
+-------------------------------+
Figure 3: Multipath Capable option
This option MUST only be present in packets with the SYN flag set.
It is only used in the initial TCP connection, in order to identify
the MCTCP session; all following (coupled) connections will use
another, similar option to join the MCTCP session.
If a SYN contains an "Multipath Capable" option but the SYN/ACK does
not, it is assumed that the responder is not multipath capable and
thus the MCTCP session MUST fall back to standard TCP. If a SYN does
not contain a "Multipath Capable" option, the SYN/ACK MUST NOT
contain one in response.
There are two tokens in a MCTCP session, one per endsystem. The
token is generated by the sender and has local meaning only. It MUST
be unique for the sender. The token MUST be difficult for an
attacker to guess, and thus it is recommended that it SHOULD be
generated randomly.
If the SYN packets are unacknowledged, it is up to a local policy to
Scharf Expires January 13, 2011 [Page 14]
Internet-Draft Multi-Connection TCP July 2010
decide how to respond. A sender SHOULD fall back to standard TCP (i.
e., without the "Multipath Capable" option) after a maximum number of
attempts, in order to work around middleboxes that may drop packets
with unknown options. The number of attempts that are made will be
up to local policy. Once the connection initiator has sent a SYN
without the "Multipath Capable" option, it MUST fall back to regular
TCP behavior, even if it subsequently receives a SYN/ACK that
contains an "Multipath Capable" option. This might happen if the
"Multipath Capable" SYN and subsequent non-MP-capable SYN are
reordered. This is to ensure that the two endpoints end up in an
interoperable state, no matter what order the SYNs arrive at the
passive opener.
4.2. Setup of Coupled Connection
An MCTCP session can open additional, coupled TCP connections. These
coupled TCP connections all run the MCTCP session protocol with TLV
encoding, as specified below. The endsystems can also use the
coupled connection to exchange knowledge about their own address(es)
- in particular the first one. Using this knowledge, an endpoint can
initiate further coupled connections over currently unused pairs of
addresses. Either endpoint that is part of an MCTCP session SHOULD
be able to initiate the creation of a new coupled connection.
A new coupled connection is started as a normal TCP three-way-
handshake. The "Join" TCP option (Figure 4) is used to identify of
which session the new connection should become a part. The token
used is the locally unique token of the destination for the
connection, as received by the "Multipath Capable" option in the SYN/
ACK exchange of the initial connection.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+
| Kind=OPT_JOIN | Length=6 | Receiver Token :
+---------------+---------------+-------------------------------+
: Receiver Token (contd.) |
+-------------------------------+
Figure 4: Multipath Join option
This option MUST only be present when the SYN flag is set. The
recipient of the "Join" option with a token that is valid for an
existing MCTCP session must decide whether to allow an additional
coupled connection, or whether to deny it. If the coupled connection
shall be established, the recipient of the SYN responds with a SYN/
ACK also containing a "Join" option, with the initiator's token.
Scharf Expires January 13, 2011 [Page 15]
Internet-Draft Multi-Connection TCP July 2010
Otherwise, if the recipient decides to deny the setup of a coupled
connection, it MUST reply with a TCP RST. If the token is unknown at
the recipient, the recipient MUST also respond with a TCP RST in the
same way as when an unknown TCP port is used. Similarly, if the
initiator of a coupled connection receives a SYN/ACK with an invalid
token or a SYN/ACK without the "Join" option, it must send a TCP RST.
In all these cases, the setup procedure of that coupled connection
MUST be abandoned. As a result, the endpoints MUST return to single-
connection mode if it is the first coupled connection. If there are
already other coupled connections, it SHOULD NOT use that address
pair for multipath transport. The verification of the tokens in both
endpoints of the MCTCP session ensures that the endpoints of a
coupled connection are identical to the endpoints of the initial
connection. Also, middleboxes that drop packets with SYN options, or
strip the option, can be detected in that way.
A local policy SHOULD ensure that an endpoint stops re-sending SYNs
with the "Join" option if it receives TCP RST or if it does not
receive corresponding SYN/ACKs. In general, an endpoint SHOULD NOT
try to open further coupled connections if previous attempts to the
same destination address failed. An endpoint SHOULD also refrain
from attempts to switch to multi-connection mode if this repeatedly
failed before; this SHOULD be governed by a local policy.
Scharf Expires January 13, 2011 [Page 16]
Internet-Draft Multi-Connection TCP July 2010
Host A Host B
------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ----------
| | | |
|---------SYN+MPCAP (Token A)------->| | ^
|<-----SYN/ACK+MPCAP (Token B)-------| | | Single-
| | | | | conn.
|########Initial connection##########| | | mode
| | | | V
~ ~ ~ ~
| | | |
|---------SYN+JOIN (Token B)-------->| |
|<------SYN/ACK+JOIN (Token A)-------| | ^
| | | | |
|<=====E. g., MCTCP Add. Address=====| | | Multi-
| | | | | conn.
| |----------SYN+JOIN (Token B)------->| | mode
| |<-------SYN/ACK+JOIN (Token A)------| |
| | | | |
|######First coupled connection######| | |
| | | | |
| |#####Second coupled connection######| V
| | | |
Figure 5: Example use of MCTCP tokens
Figure 5 illustrates the usage of the two MCTCP tokens. An endpoint
can decide to switch to multi-connection mode any time, as long as
the initial connection is established. In multi-connection mode, an
endpoint can add further coupled connections at any time.
4.3. Usage of Coupled Connections
The setup of the first coupled connection MUST use the same source
and destination IP addresses and SHOULD use same destination port
like the initial connection. This implies that the first coupled
connection SHOULD be actively opened by the initiator of the initial
connection. This constraint ensures that the first coupled
connection indeed uses valid addresses and that it uses the same path
like the initial connection. It also facilites user-space
implementation and network address port translation (NAPT) traversal.
The first coupled connection has a special role because it enables
the exchange of addresses or other information, which can be useful
to setup additional coupled connections.
The token supplied in the initial connection's SYN exchange is used
for the demultiplexing of coupled connections, i. e., to associate a
Scharf Expires January 13, 2011 [Page 17]
Internet-Draft Multi-Connection TCP July 2010
new coupled connection to an existing MCTCP session. This means that
the port numbers in a SYN of a coupled connection MAY NOT be used for
demultiplexing. Still, an active opener of a new coupled connection
SHOULD use a destination port numbers that is already in use by the
passive opener, as long as the 5-tuple is unique for each host. Once
a coupled connection is established, demultiplexing packets is done
using the five-tuple, as in traditional TCP. This strategy is
intended to maximize the probability of the SYN being permitted by a
firewall or network address port translation (NAPT) at the recipient
and to avoid confusing any network monitoring software.
Control information can be sent over any established coupled
connection, and it always affects the MCTCP session as a whole. As
control information and data chunks are transported over the same
pipe and may experience queueing in the send buffer, it is reasonable
to send important control information immediately after the
establishment of a new coupled connections (as shown in Figure 4 for
an "MCTCP Additional Address" message). A scheduler in the MCTCP
session layer decides which MCTCP messages are sent over which
coupled connection.
4.4. Operation Mode Switch
An MCTCP session endpoint MUST change its operation mode from single-
connection to multi-connection mode once the first coupled connection
is sucessfully setup.
Either endpoint of an MCTCP session can request the other endpoint to
switch to multi-connection mode by a "Mode" TCP option that is
depicted in Figure 6. This may be useful if only the other endpoint
can establish coupled TCP connections, e. g., if it is located behind
a middlebox performing network address port translation (NAPT).
1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+---------------+---------------+
| Kind=OPT_MODE | Length=2 |
+---------------+---------------+
Figure 6: Mode option
This TCP option MAY be set in segments of the initial connection.
Its implementation is RECOMMENDED. It MAY be set in segments without
or with payload once the initial connection is established, as long
as the MCTCP session is not in multi-connection mode. The option is
also allowed in SYN/ACK segments, but not in pure SYN segments. If
it is set in the SYN/ACK, it asks the connection initiation to enter
multi-connection mode immediately. When receiving a "Mode" TCP
Scharf Expires January 13, 2011 [Page 18]
Internet-Draft Multi-Connection TCP July 2010
option, an MCTCP endpoint MAY send a SYN with the "Join" TCP option
to the destination address and port of the initial connection, and
switch to multi-connection mode. It is also allowed to silently
ignore that notification and to continue in single-connection mode.
An endsystem MUST refrain from resending "Mode" TCP options
frequently if the MCTCP session cannot successfully negotiate the
multi-connection mode, in order to avoid needless effort.
5. MCTCP Session Protocol Messages
All coupled TCP connections run the MCTCP session protocol, which
transports both data chunks and control messages in the format that
is defined in this section.
5.1. Data Segmentation and Encoding
In multi-connection mode, MCTCP segments data in chunks and
transports them as TLV-encoded messages over one or more coupled TCP
connections. The framing format of these chunks is shown in
Figure 7.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-------------------------------+---------------+
| Type=MSG_CHUNK| Total message length |C| Reserved |
+---------------+-------------------------------+---------------+
| Session sequence number (64 bit) :
+---------------------------------------------------------------+
: Session sequence number (contd.) |
+---------------------------------------------------------------+
| |
~ Data chunk (variable) ~
| |
+---------------------------------------------------------------+
| Optional checksum (32 bit) |
+---------------------------------------------------------------+
Figure 7: MCTCP Data Chunk message
If a receiver observes a corrupted MCTCP message, e. g., by invalid
TLV format or an invalid checksum, it SHOULD close the corresponding
coupled connection by sending a TCP FIN.
MCTCP uses global sequence number during a session. The value 0
refers to the first byte that is sent over the initial connection.
An MCTCP receiver reassembles the byte stream according to that
sequence number and delivers the data in-order to the upper protocol
layer or application.
Scharf Expires January 13, 2011 [Page 19]
Internet-Draft Multi-Connection TCP July 2010
If the the C-flag is set, the MCTCP Data Chunk message includes a 32
bit checksum that covers the whole MCTCP message. The checksum is
OPTIONAL, but it helps to detect middleboxes that modify the TCP byte
stream. If it is present, a receiver MUST verify the checksum. If
there is a checksum mismatch, the receiver MUST discard the MCTCP
message and its data, and it SHOULD close the corresponding coupled
connection, as the integrity of the TLV framing on that connection is
not guaranteed any more. The receiver MAY ask for a retransmission
of the corresponding data chunk over an alternative coupled
connection, as defined in the next section. If there is only one
coupled connection, there is a possibility to fall-back to transport
over the initial connection, as discussed below.
If present, the checksum is calculated by the Castagnoli CRC 32C
algorithm that is also used in the Stream Control Transmission
Protocol (SCTP) [4].
The sequence number in the first Data Chunk message sent over coupled
TCP connections SHOULD be the first byte that the MCTCP
implementation has not already enqueued on the initial connection.
In that case, there is no overlap between data transported over the
initial connection and data transport over the coupled connections,
which simplifies the reassembly. An MCTCP sender MAY also resend
data that has already been written to the initial connection if a
coupled connection can use a faster path, but it MUST NOT resend data
that has already been acknowledged on the initial connection by the
receiver.
A sender SHOULD NOT write further data to the initial connection
after it has sent its first Data Chunk message to a coupled
connection, in order to simplify the reconstruction of the byte
stream in the receiver. The only exception is a fallback to single
connection mode, which is needed if all coupled connections are
closed. The initial connection transports the upper layer protocol's
byte stream without any gaps, i. e., the global session sequence
number implicitly increases continuously even after multi-connection
mode is entered. As a consequence, apart from redundancy and
fallback, it does not make much sense to continue sending the
application byte stream over the initial connection. A receiver
SHOULD close the MCTCP session if it detects an inconsistency between
the byte stream received over the initial connection and the data
chunks on the coupled connections.
The maximum allowed size of an MCTCP message is 65535 octets.
Therefore, the maximum data chunk size is 2^16-13 = 65523 octets.
The minimum allowed data chunk size is 1 octet.
The segmentation of the application byte stream into data chunks and
Scharf Expires January 13, 2011 [Page 20]
Internet-Draft Multi-Connection TCP July 2010
their assignment to coupled TCP connections is decided by a local
algorithm in the MCTCP sender, which may take into account the path
characteristics such as MSS, congestion control state, and other
relevant information (e. g., the page size in case of a kernel
implementation). An efficient segmentation algorithm should avoid
sending small data chunks to reduce the header overhead both in the
MCTCP and TCP layer.
MCTCP does not provide positive acknowledgements at session layer,
since TCP transport is reliable as long as paths do not fail. It is
an allowed behavior for an MCTCP instance to free the memory after
handing data over to a connection. In that case, if a coupled TCP
connection fails or if it is closed, it may be impossible to complete
the transfer on other coupled connections. Therefore, it is
RECOMMENDED that an MCTCP instance caches sent data for a certain
time. An MCTCP sender can duplicate or retransmit data chunks over
other coupled connections, even with overlapping sequence numbers.
The receiver can explicitly request such retransmissions as described
in the next section. A retransmission strategy is more efficient if
the retransmission is sent over a coupled connection that does not
have a long-standing sending queue. The MCTCP sender can infer the
connection state from the sequence numbers and congestion control
state of the individual connections.
5.2. Retransmission Requests
As the individual coupled TCP connections provide already reliable
transport, the session error recovery must only deal with connection
failure or middlebox problems. If a path fails, it will be necessary
to retransmit the data that has not been sucessfully transported. In
this case the MCTCP sender SHOULD retransmit the data on a coupled
connection over another path by assembling new MCTCP Data Chunk
messages. It MAY also close the MCTCP session instead.
There are two different solutions how the MCTCP sender can determine
what data has to be retransmitted: It can either try to implicitly
determine the missing data from the amount of unacknowledged data in
the connections that fails, if it has access to this information.
Alternatively, the MCTCP receiver can explicitly request for the
retransmission of data that has not successfully been received.
Since MCTCP session messages are transported reliably, MCTCP uses a
negative acknowledgment (NACK) mechanism: The receiver MAY send MCTCP
Retransmission Request messages in order to indicate gaps in the
received global sequence number space. However, a receiver SHOULD
wait until there is reasonable evidence that the data has been lost
due to path failure, or that a retransmission over another coupled
connection would be of significant benefit, in order to avoid
Scharf Expires January 13, 2011 [Page 21]
Internet-Draft Multi-Connection TCP July 2010
spurious retransmissions. The MCTCP Retransmission Request message
MAY also be sent after a checksum mismatch in a Data Chunk message.
It is allowed to send these messages over several coupled connections
in parallel. Such messages should only seldomly be required, since
TCP transport is in general reliable unless paths completely fail.
If there are several gaps in the sequence number space, the receiver
SHOULD coalesce the sequence numbers in a reasonable way to reduce
the overhead. The message format of the MCTCP Retransmission Request
message is defined in Figure 8:
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-------------------------------+---------------+
| Type=MSG_RTXRQ| Total message length |C| Reserved |
+---------------+-------------------------------+---------------+
| Start session sequence number (64 bit) :
+---------------------------------------------------------------+
: Start session sequence number (contd.) |
+---------------------------------------------------------------+
| End session sequence number (64 bit) :
+---------------------------------------------------------------+
: End session sequence number (contd.) |
+---------------------------------------------------------------+
Figure 8: MCTCP Retransmission Request message
The two sequence numbers refer to the first and last missing byte in
the session sequence number space. Upon reception of this message, a
MCTCP sender SHOULD retransmit the data over one or more subflows,
other than the one that has originally been used. The MCTCP sender
must still have the data buffered in order to be able to retransmit
the data. MCTCP also allows that the MCTCP sender closes the MCTCP
session instead of retransmitting data, as single-path data transport
over that path would have failed, too.
5.3. Address Advertisement
As motivated in [7], path management refers to the exchange of
information about additional paths between endpoints. MCTCP requires
multiple addresses at endpoints to be able to use multiple, possibly
at least partly disjoint paths.
In multi-connection mode, MCTCP can explicitly signal additional
addresses of one endpoint to the other endpoint, which allows it to
initiate new connections. The MCTCP session can therefore also deal
with addresses that change.
The "Add Address" MCTCP message announces additional addresses on
Scharf Expires January 13, 2011 [Page 22]
Internet-Draft Multi-Connection TCP July 2010
which an endpoint can be reached (Figure 9 and Figure 10). Multiple
messages can be sent subsequently in order to advertise several
addresses. This message can be sent at any time over any coupled
connection.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-------------------------------+---------------+
| Type=MSG_AADD4| Total message length = 8 | Reserved |
+---------------+-------------------------------+---------------+
| IPv4 address (32 bit) |
+---------------------------------------------------------------+
Figure 9: MCTCP Additional IPv4 Address message
In Figure 9, the "Additional Address" message is shown for IPv4. The
reserved bits could be used to express priorities or policies (e. g.,
"use now").
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-------------------------------+---------------+
| Type=MSG_AADD6| Total message length = 20 | Reserved |
+---------------+-------------------------------+---------------+
| |
~ IPv6 address (128 bit) ~
| |
+---------------------------------------------------------------+
Figure 10: MCTCP Additional IPv6 Address message
Furthermore, there are MCTCP message to remove candidate addresses,
which are shown in Figure 11 and Figure 12. If an address is
removed, an endpoint SHOULD NOT try to open further coupled
connections to that address. Already established coupled connections
are not affected by these messages and must be explicitly closed
separately.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-------------------------------+---------------+
| Type=MSG_RADD4| Total message length = 8 | Reserved |
+---------------+-------------------------------+---------------+
| IPv4 address (32 bit) |
+---------------------------------------------------------------+
Figure 11: MCTCP Remove IPv4 Address message
Scharf Expires January 13, 2011 [Page 23]
Internet-Draft Multi-Connection TCP July 2010
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-------------------------------+---------------+
| Type=MSG_RADD6| Total message length = 20 | Reserved |
+---------------+-------------------------------+---------------+
| |
~ IPv6 address (128 bit) ~
| |
+---------------------------------------------------------------+
Figure 12: MCTCP Remove IPv6 Address message
5.4. Connection Management and Fallback
Each coupled TCP connection is maintained individually. A FIN only
closes that individual connection. If an application closes the
socket, the MCTCP shim layer MUST close the initial connection and
all existing coupled connection. Apart from that, the MCTCP layer
may always close (or even re-open) coupled connections, governed by
the local path management policies. In multi-connection mode, the
MCTCP session is only closed once all coupled connections are closed.
Coupled connections can be kept in the half-open state, but the MCTCP
connection management SHOULD avoid this. It would be possible to
specify an MCTCP message for explicitly closing the MCTCP session, or
several coupled connections, but this is left for further study.
MCTCP SHOULD keep the initial connection established when being in
multi-connection mode, even if it is not used for data transport any
more. This allows to expose valid addresses and port numbers to the
application [11]. Keep-alives MAY be sent. The initial connection
is closed by the MCTCP layer when all coupled connections are closed.
If the initial connection is closed, the whole MCTCP session SHOULD
be closed, too. Further studies are needed to understand whether the
initial connection could savely be closed earlier, and whether an
MCTCP session can be kept established even if the addresses of the
initial connections cannot be used any more.
If an MCTCP receiver detects that the byte stream on a coupled
connection has been modified by a middlebox, it SHOULD close the
corresponding coupled connection. By error recovery and
retransmission schemes the corresponding data can then be transfered
over other coupled connections. If all coupled connections are
closed, the session SHOULD fall back to single-connection mode.
Then, data transfer SHOULD continue over the initial connection. The
MCTCP session MUST NOT try to enter multi-connection mode again. As
an alternative, either of the two session endpoints MAY decide to
close the MCTCP session in case of such an violation of TCP's end-to-
end semantics.
Scharf Expires January 13, 2011 [Page 24]
Internet-Draft Multi-Connection TCP July 2010
In certain cases, byte counters of the initial connection in the
sender and receiver could get desynchronized if a middlebox
transparently changes the length of the content sent over the initial
connection. As also discussed in Section 8, this violation of TCP's
end-to-end semantics can be detected in the receiver, e. g., if there
is a gap between the first byte received from the coupled connections
and the last byte received from the initial connection.
Alternatively, there could be an overlap or potentially even
mismatching content. If the receiver detects this, it SHOULD
immediately close all coupled connections. This means that the MCTCP
session falls back to single-connection mode and continues the byte
stream data transport over the initial connection, including all
middlebox modifications. As an other remedy, or if a fallback is not
possible, either sender or receiver MAY also decide to close the
MCTCP session in case of such an event. Further work is needed to
define whether MCTCP should also have a method to resynchronize the
sequence numbers at sender and receiver in such cases.
6. MCTCP Session Policies and Algorithms
This document does not mandate specific policies how to use and share
resources on the coupled connections. Still, this section addresses
some important issues that an MCTCP implementation must take into
account.
6.1. Message Scheduling
Data and control messages can be assigned to any coupled TCP
connection and are sent then over that connection. Messages may be
duplicated or retransmitted for redundancy reasons. The receiver
MUST process the messages in one coupled TCP connection in the order
of arrival. In-order message processing among several coupled
connection of one MCTCP session is not ensured.
6.2. Congestion and Flow Control
The MCTCP protocol does not have an own congestion control, nor an
own flow control. Instead, it relies on the algorithms in the
individual TCP connections. In the following, the operation is
explained more in detail for the multi-connection mode. In single-
connection mode, there is no difference compared to a normal TCP
connection.
Concerning flow control, the operation is straightforward: If the
MCTCP receiver runs out of buffer space, it stops reading data from
one or more coupled TCP connections. Depending on TCP's flow control
and the available receive buffer, the flow control on one or more
connections may throttle data transport until the MCTCP layer can
Scharf Expires January 13, 2011 [Page 25]
Internet-Draft Multi-Connection TCP July 2010
process data again.
The MCTCP layer SHOULD at least be able to queue one full-sized MCTCP
message (i. e., 65535 byte) for each established coupled TCP
connection. In order to avoid stalls of the data transfer, an
endsystem SHOULD NOT actively or passively open coupled TCP
connection when it is short on memory. Similarly, coupled
connections SHOULD NOT be established if an application explicitly
sets small send or receive buffer sizes [11].
The coupled connections have different congestion windows. To
achieve resource pooling, it is necessary to couple the congestion
windows in use on each connection, in order to push most traffic to
uncongested links and avoid unfairness. One algorithm that aims at
achieving this objective is presented in [10]. MCTCP is able to use
this or other coupled congestion control algorithms.
In addition, an MCTCP sender may have local policies to decide how
much traffic to sent over the available connections. It could also
obtain path cost metrics from the receivers. The latter could be
realized by a new MCTCP messages defining connection priorities,
which is left for further study.
7. Interfaces
This section describes MCTCP's interfaces from a functional point of
view. Their realization is implementation-specific.
7.1. Interface between MCTCP and TCP
MCTCP must be able to control a small set of features inside a TCP
stack and therefore requires a corresponding interface:
o The MCTCP layer must be able to set a "Multipath Capable" or
"Join" TCP option in SYN segments. It must also be notified if
those options are set in an incoming SYN segment, it must be able
to access the tokens, and it must be able to influence how to
respond depending on the token value (i. e., either by a SYN/ACK
or RST).
o The MCTCP layer may set the "Mode" TCP option on the established
initial connection, in any segment other than pure SYNs, and it
should be notified if that option is received.
o The MCTCP layer must be able to affect the congestion window on
each coupled connection. Depending on the algorithm, it may be
sufficient just to set periodically certain parameters of the
congestion control, such as the additive increase factor.
Scharf Expires January 13, 2011 [Page 26]
Internet-Draft Multi-Connection TCP July 2010
For efficient operation, MCTCP may also have to read certain
information from each coupled TCP connection, such as:
o The current amount of acknowledged and unacknowledged data on that
connection, or the corresponding pointers to the byte stream.
o The receive window advertised by the other endpoint on that
connection.
o The estimated round-trip time.
o The maximum transmission unit (MTU) of the path, or TCP's maximum
segment size (MSS). Note that the MSS is not a constant value if
TCP options are added to data segments.
Many operating systems provide already information about a subset of
these parameters by a kernel/user-space interface.
7.2. Interface to Applications
MCTCP provides reliable, in-order, byte-stream transport to
applications and thus can be used by legacy applications like a
standard TCP connection [11]. When MCTCP is realized inside the
network stack, it is a new function block between the TCP instance
and the socket interface, which is transparent to applications.
Alternatively, MCTCP can be implemented in large parts by a user-
space library that accesses an extended network stack by the socket
interface, which may have to be enhanced to provide some additional
control functions as explained in the previous section. Applications
could then still use the standard APIs to that library and would not
be affected at all. Such a user-space implementation in combination
with a simple patch of the network stack could facilitate the initial
deployment of MCTCP.
8. Interaction with Middleboxes
There are various types of middleboxes in the Internet. Some of them
only parse a TCP stream (e. g., deep packet inspection), while others
change TCP header fields on the fly, and some may even rewrite the
TCP payload. MCTCP is designed to be compatible with most types of
middleboxes, but as middlebox behavior is not well specified, some
open issues may remain.
8.1. Middleboxes that Manipulate TCP Options
One class of middleboxes may strip, duplicate, or modify TCP options
and/or drop packets with unknown TCP options, and this may even
Scharf Expires January 13, 2011 [Page 27]
Internet-Draft Multi-Connection TCP July 2010
depend on whether the SYN flag is set or not. If a middlebox removes
MCTCP's TCP options in SYN segments, multipath transport will not be
enabled at all (if that middlebox is on the path of the initial
connection), or not over that path (if the middlebox is on the path
of a potential coupled connection towards another address). Still,
data transfer over the initial connection or other coupled
connection(s) can continue without being significantly affected.
Other TCP options that could be used by MCTCP are non-mandatory, i.
e., the data integrity is not affected when these options are
stripped or duplicated. In summary, unlike protocols that transport
essential information in TCP options outside SYNs, MCTCP operates
savely in an environment with middleboxes that strip, duplicate, or
modify TCP options and/or drop packets with unknown TCP options.
8.2. Middleboxes that Change Content
Other middleboxes may rewrite the content of the TCP payload and
possibly also its length (e. g., by rewriting URIs). MCTCP, as well
as other multipath transport solutions, requires a session level
sequence number space for the in-order reassembly of the application
data. If a middlebox changes the content and/or length on the
initial connection or on coupled connections, it may be impossible to
correctly reassemble the byte stream at the receiver.
MCTCP will in many cases be able to detect changes of content over
coupled connections, as it looses track of the TLV framing on that
connection. Content modifications can even better be detected if the
sender adds checksums to the data chunks. If MCTCP detects a
middlebox that changes the byte stream on a coupled connection, it
will close the corresponding coupled connection. By error recovery
and retransmission schemes the corresponding content can then be
transfered over other coupled connections, or over the initial
connection as a fallback method.
If a middlebox changes the length of the byte stream on the initial
connection, the sequence numbers at sender and receiver will not be
synchronized when entering multi-connection mode, and there could be
a gap or an overlap even with mismatching content. MCTCP can detect
both cases. MCTCP keeps the initial connection open even in multi-
connection mode. Therefore, if a content length modification on the
initial connection is detected, it can fall back to the initial
connection by closing all coupled connections and continue to use
single-path transport.
Scharf Expires January 13, 2011 [Page 28]
Internet-Draft Multi-Connection TCP July 2010
8.3. Middleboxes that Translate Addresses/Ports
NAPT middleboxes that are unaware of MCTCP create two problems:
First, as hosts have local addresses only, and the global addresses
are not necessarily known to host behind the NAPT, it may not be
possible to advertise addresses to the other endpoint. Second, it
may be impossible for one endpoint to open a coupled TCP connection
to an endpoint sitting behind a NAPT middlebox.
In order to address the latter issue, MCTCP defines the Mode option.
With that option, one endpoint can ask the other endpoint to enter
multi-connection mode. As shown in Figure 13, sending this TCP
option is useful if one endpoint has multiple public IP addresses,
but cannot anounce them over the initial connection. If the host
behind the NAPT middlebox receives the option and establishes a
coupled connection, this can be used to convey the information about
the other public address, and a coupled connection to that address
can then be established, too.
Host A NAPT Host B
------------------------ // ------------------------
Address A1 Address A2 // Address B1 Address B2
(private) (private) // (public) (public)
---------- ---------- // ---------- ----------
| | // | |
|---------SYN+MPCAP------//--------->| | ^
|<-----SYN/ACK+MPCAP-----//----------| | | Single-
| | // | | | conn.
|###Initial connection###//##########| | | mode
| | // | | V
~ ~ ~~ ~ ~
| | // | |
|<--------Mode option----//----------| |
| | // | |
|---------SYN+JOIN-------//--------->| |
|<------SYN/ACK+JOIN-----//----------| | ^
| | // | | |
|#1st coupled connection#//##########| | |
| | // | | |
|<=MCTCP Add. Address B2=//==========| | | Multi-
| | // | | | conn.
|---------SYN+JOIN-------//----------------------->| | mode
|<------SYN/ACK+JOIN-----//------------------------| |
| | // | | |
|#2nt coupled connection#//########################| V
| | // | |
Figure 13: Example use of the Mode option
Scharf Expires January 13, 2011 [Page 29]
Internet-Draft Multi-Connection TCP July 2010
8.4. Middleboxes that Want to Control MCTCP Traffic
Given that MCTCP transports control information in the payload, it is
more complex for middleboxes to parse and potentially modify MCTCP's
control information. In order to do so, a middlebox must perform
deep packet inspection and it has to parse the MCTCP session messages
in the TCP connection. This may prevent certain operations and
optimizations by middleboxes. However, it should be noted that
middleboxes cannot affect the payload in TLS neither, i. e., MCTCP is
somehow similar to TLS in that sense. As a remedy, it could be
possible to define a TCP option that contains an offset field with a
pointer to the first byte of an MCTCP control message, so that a
middlebox can find control messages without parsing the whole byte
stream of a coupled TCP connection. Yet, such an option would be
subject to all limitations of sporadically added TCP options.
A middlebox that wants to prevent MCTCP usage can drop SYN segments
containing the "Join" TCP option without causing any significant
harm. If that middlebox is on the path of the initial connection,
MCTCP will continue using the backward-compatible initial TCP
connection only. If the middlebox is on the path towards another
address, i. e., if the multi-connection mode is already entered,
MCTCP will not establish an additional coupled connection. Under the
assumption of stable routing, no TLV-encoded content will pass that
middlebox in both cases. Instead of dropping SYN segments with the
"Join" TCP option, a middlebox could also strip the "Join" option, as
the setup of a coupled connection will then fail. This method would
avoid timeouts and further retransmission attempts by the sender.
Alternatively, a middlebox could remove the "Multipath Capable" TCP
option from SYN segments. Then, MCTCP will be identical to a
standard TCP connection and never try to switch to multi-connection
mode. However, it is not recommended to drop SYN segments containing
the "Multipath Capable" TCP option as a means to prevent MCTCP, since
this needlessly results in a longer connection setup time, and since
just dropping segments with the "Join" option would be sufficient.
8.5. Middleboxes that Proactively Acknowledge Data
Finally, there might be middleboxes that proactively acknowledge
data, or middleboxes that transparently split the TCP connection.
Such middleboxes break the end-to-end semantics of TCP connections
[6], i. e., TCP cannot ensure a reliable end-to-end transport of data
over such middleboxes. Mitigating the drawbacks of proactively
acknowledging middleboxes is mostly orthogonal to multipath
transport.
Yet, if such a middleboxe is on a path used by MCTCP, and if this
Scharf Expires January 13, 2011 [Page 30]
Internet-Draft Multi-Connection TCP July 2010
path fails, a specific problem arises: The MCTCP sender may
erroneously assume that the data over the corresponding coupled
connections has already been received by the receiver, and therefore
it will not retransmit it. In that case, after some time, the MCTCP
receiver will observe a gap in the session sequence number space and
can issue a request for retransmission. The sender can then decide
whether to retransmit the data over another coupled connection to
solve this problem, or it can just close the session. MCTCP
explicitly allows the latter behavior as a single-path transport over
the path with that middlebox would have failed, too.
If MCTCP used positive session layer acknowledgements, future
middleboxes could parse MCTCP's session messages and proactively
acknowledge data on the session level, too. MCTCP does not
incorporate a positive session layer acknowledgement mechanism in
order to prevent such a further violation of the end-to-end
principle. Of course, future middleboxes could still try to modify
the retransmission requests inside the coupled connections, but this
would not have any significant benefit.
9. Open Issues
o Avoiding inconsistencies when switching in parallel to multi-
connection mode.
o MCTCP does not support out-of-band TCP signaling transport (urgent
flag).
10. Security Considerations
A generic threat analysis for the addition of multipath capabilities
to TCP is presented in [9]. MCTCP is designed along the assumptions
of that document, with some enhancements. In general, MCTCP is
subject to similar security threads like [8], but due to its
extensibility, additional protection mechanisms could be incorporated
in a future version. For instance, MCTCP can employ more secure
mechanisms to protect the coupling of TCP connections, even by
cryptographic keys like in TLS.
MCTCP uses a 32bit token only, in order to save TCP option space in
SYN segments. This is reasonable, as this token is only required to
authenticate the initiator of the first coupled connection, which
must use the same IP source and destination address like the initial
connection, i. e., off-path attacks are not possible. Coupled
connections that are added subsequently could use a more secure
protection scheme at the MCTCP session layer, either by longer 64bit
tokens, or even by cryptographic methods, which could be exchanged by
corresponding MCTCP control messages (not specified in this version
Scharf Expires January 13, 2011 [Page 31]
Internet-Draft Multi-Connection TCP July 2010
of the document).
This section will be extended in a later version of this document.
11. IANA Considerations
This document will make a request to IANA to allocate new values for
TCP option identifiers:
o OPT_MPCAP ("Multipath Capable" option)
o OPT_JOIN ("Join" option in order to add a coupled connection to
the MCTCP session)
o OPT_MODE ("Mode" option that requests change from single-
connection to multi-connection operation mode)
This document also defines several types of MCTCP messages:
o MSG_CHUNK ("MCTCP Data Chunk")
o MSG_RTXRQ ("MCTCP Retransmission Request")
o MSG_AADD4 ("MCTCP Additional IPv4 Address")
o MSG_AADD6 ("MCTCP Additional IPv6 Address")
o MSG_RADD4 ("MCTCP Remove IPv4 Address")
o MSG_RADD6 ("MCTCP Remove IPv6 Address")
12. Conclusion
Multi-connection TCP transport is a simple, modular, and extensible
solution to enable reliable transfer over multiple paths. This
specification defines the protocol on top of the TCP byte stream, the
few required extensions of TCP, and the light-weight interface
between MCTCP and each TCP connection. In summary, MCTCP is a
reasonable and incrementally deployable alternative to a signaling
mechanism that uses TCP options only.
13. Acknowledgments
Michael Scharf is supported by the German-Lab project
(http://www.german-lab.de/) funded by the German Federal Ministry of
Education and Research (BMBF).
14. References
Scharf Expires January 13, 2011 [Page 32]
Internet-Draft Multi-Connection TCP July 2010
14.1. Normative References
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981.
[2] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[4] Stewart, R., "Stream Control Transmission Protocol", RFC 4960,
September 2007.
[5] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS)
Protocol Version 1.2", RFC 5246, August 2008.
14.2. Informative References
[6] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Mitigate
Link-Related Degradations", RFC 3135, June 2001.
[7] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural
Guidelines for Multipath TCP Development",
draft-ietf-mptcp-architecture-01 (work in progress), June 2010.
[8] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for
Multipath Operation with Multiple Addresses",
draft-ietf-mptcp-multiaddressed-00 (work in progress),
June 2010.
[9] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
TCP", draft-ietf-mptcp-threat-02 (work in progress),
March 2010.
[10] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
Aware Congestion Control", draft-raiciu-mptcp-congestion-01
(work in progress), March 2010.
[11] Scharf, M. and A. Ford, "MPTCP Application Interface
Considerations", draft-scharf-mptcp-api-02 (work in progress),
July 2010.
Appendix A. Possible Future MCTCP Extension
This memo describes the baseline specification of MCTCP and the
required minimum set of functions. A future version of this
Scharf Expires January 13, 2011 [Page 33]
Internet-Draft Multi-Connection TCP July 2010
specification may additionally add several other features to MCTCP,
such as:
o Exchange of longer tokens (e. g., 64bit) for connection coupling,
using MCTCP control messages.
o Signaling messages to exchange policy information concerning the
usage of the coupled TCP connections.
o A signaling message that advertises combination of addresses and
port numbers, e. g., to deal with corresponding policies on one
endpoint.
o A signaling message that advertises additional addresses in
another format, e. g., as URI.
o MCTCP session positive level acknowledgements ("data
acknowledgement").
o A checksum in all MCTCP messages.
o Signaling messages to negotiate different payload encoding
formats, e. g., MIME-like encoding. A future version of the MCTCP
session protocol could also define retransmission requests for a
different encoding format to work around content modifying
middleboxes.
o MCTCP control messages that manage coupled connections, such as a
method to explicitly ask for closing several connections at MCTCP
layer, similar to a "DATA FIN".
o A simple MCTCP session flow control mechanism, complementing TCP's
flow control.
o A negotiation whether to indeed keep the initial connection
established in multi-connection mode, assuming that it could
either be closed or reused as a coupled connection.
o A variant of this protocol that uses TLV-encoded message transport
right from the beginning.
o A method to discover and negotiate features between the two MCTCP
session endpoints, e. g., by Hello messages similar to TLS.
Further studies are needed to determine whether some of these
functions should be added to MCTCP. If so, their implementation may
partly be optional and negotiated between the session endpoints. The
baseline MCTCP design should be kept as simple as possible.
Scharf Expires January 13, 2011 [Page 34]
Internet-Draft Multi-Connection TCP July 2010
Appendix B. Change History of the Document
Changes compared to version 00:
o Addition of a checksum in data chunk messages
o Definition of a message to request retransmission
o Description of how to fall back to single-connection mode
o Discussion of proactively acking middleboxes
o Various clarifications of the design motivations
Author's Address
Michael Scharf
Alcatel-Lucent Bell Labs
Lorenzstrasse 10
70435 Stuttgart
Germany
EMail: michael.scharf@alcatel-lucent.com
Scharf Expires January 13, 2011 [Page 35]