Internet Engineering Task Force A. Ford, Ed.
Internet-Draft Roke Manor Research
Intended status: Informational C. Raiciu
Expires: August 7, 2010 University College London
S. Barre
Universite catholique de
Louvain
J. Iyengar
Franklin and Marshall College
B. Ford
Max Planck Institute for Software
Systems
February 3, 2010
Architectural Guidelines for Multipath TCP Development
draft-ford-mptcp-architecture-01
Abstract
Often endpoints are connected by multiple paths, but the nature of
TCP/IP restricts communications to a single path per socket.
Resource usage within the network would be more efficient were these
multiple paths able to be used concurrently. This should enhance
user experience through improved resilience to network failure and
higher throughput.
This document outlines architectural guidelines for the development
of a Multipath Transport Protocol, with references to how these
architectural components come together in the Multipath TCP (MPTCP)
protocol. This document also lists certain high level design
decisions that provide foundations for the MPTCP design, based upon
these architectural requirements.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Ford, et al. Expires August 7, 2010 [Page 1]
Internet-Draft MPTCP Architecture February 2010
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 7, 2010.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License.
Ford, et al. Expires August 7, 2010 [Page 2]
Internet-Draft MPTCP Architecture February 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
1.4. Reference Scenario . . . . . . . . . . . . . . . . . . . . 5
2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Functional Goals . . . . . . . . . . . . . . . . . . . . . 5
2.2. Compatibility Goals . . . . . . . . . . . . . . . . . . . 6
2.2.1. Application Compatibility . . . . . . . . . . . . . . 6
2.2.2. Network Compatibility . . . . . . . . . . . . . . . . 7
2.2.3. Compatibility with other network users . . . . . . . . 7
3. Multipath Architecture . . . . . . . . . . . . . . . . . . . . 7
3.1. Decomposing Transport Functions . . . . . . . . . . . . . 9
4. High-Level Design Decisions . . . . . . . . . . . . . . . . . 11
4.1. Sequence Numbering . . . . . . . . . . . . . . . . . . . . 11
4.2. Reliability . . . . . . . . . . . . . . . . . . . . . . . 12
4.3. Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4. Signalling . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5. Path Management . . . . . . . . . . . . . . . . . . . . . 14
4.6. Connection Identification . . . . . . . . . . . . . . . . 14
4.7. Network Layer Compatibility . . . . . . . . . . . . . . . 15
4.8. Congestion Control . . . . . . . . . . . . . . . . . . . . 15
5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6. Security Considerations . . . . . . . . . . . . . . . . . . . 16
7. Interactions with Applications . . . . . . . . . . . . . . . . 16
8. Interactions with Middleboxes . . . . . . . . . . . . . . . . 16
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
11.1. Normative References . . . . . . . . . . . . . . . . . . . 16
11.2. Informative References . . . . . . . . . . . . . . . . . . 17
Appendix A. Implementation Architecture . . . . . . . . . . . . . 17
A.1. Functional Separation . . . . . . . . . . . . . . . . . . 18
A.1.1. Application to default MPTCP protocol . . . . . . . . 18
A.1.2. Generic architecture for MPTCP . . . . . . . . . . . . 21
A.2. PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 22
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23
Ford, et al. Expires August 7, 2010 [Page 3]
Internet-Draft MPTCP Architecture February 2010
1. Introduction
Multipath TCP (MPTCP) is a set of extensions of regular TCP [2] that
allow one TCP connection to be spread across multiple paths. This
section describes the motivation behind the design of Multipath TCP.
Companion documents to this architectural overview are those which
provide details of the protocol extensions [3], congestion control
algorithms [4], and application-level considerations [5]. Put
together, these components build a complete Multipath TCP
implementation. Other components, however, could be introduced in
place of these, in accordance with the architecture specified in this
document.
Please note this document is a work-in-progress and covers several
topics, some of which may be more appropriately moved to separate
documents as this work evolves.
1.1. Motivation
As the Internet evolves, demands on Internet resources are ever-
increasing, but often these resources (in particular, bandwidth)
cannot be fully utilised due to protocol constraints both on the end-
systems and within the network. If these resources could instead be
used concurrently, end user experience could be greatly improved.
Such enhancements would also reduce the necessary expenditure on
network infrastructure which would otherwise be needed to create an
equivalent improvement in user experience.
By the application of resource pooling [6], these available resources
can be 'pooled' such that they appear as a single logical resource to
the user. The purpose of Multipath TCP, therefore, is to provide a
TCP to the user that is able to make use of multiple available paths.
The achievement of resource pooling through Multipath TCP bring two
key benefits:
o To increase the resilience of the connectivity by providing
multiple paths, protecting end hosts from the failure of one.
o To increase the efficiency of the resource usage, and thus
increase the network capacity available to end hosts.
Multipath TCP as presented in [3] addresses these aims, by achieving
resource pooling through splitting a TCP session to run over multiple
paths, and presenting it as a single TCP connection to the
application. This is not the only way of creating a Multipath TCP,
however, and as such this architecture is designed so that other
Ford, et al. Expires August 7, 2010 [Page 4]
Internet-Draft MPTCP Architecture February 2010
components can be used to create an alternative solution, while still
achieving the goals of resource pooling.
1.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1].
1.3. Terminology
Path: A sequence of links between a sender and a receiver, defined
in this context by a source and destination address pair.
Endpoint: A host either initiating or terminating a MPTCP
connection.
Multipath TCP (MPTCP): A modified version of the TCP [2] protocol
that supports the simultaneous use of multiple paths between
endpoints.
Subflow: A flow of TCP packets operating over an individual path,
which forms part of a larger MPTCP connection.
MPTCP Connection: A set of one or more subflows combined to provide
a single Multipath TCP service to an application at an endpoint.
1.4. Reference Scenario
TBD - would this be useful?
Endpoints, routes. Addresses/path selection mechanisms?
2. Goals
This section outlines key goals for Multipath TCP. These are
separated into functional goals, i.e. the behaviour that MPTCP must
provide, and compatibility goals, i.e. the impact MPTCP must place on
other entities.
2.1. Functional Goals
The fundamental goal of MPTCP is to use multiple paths (which are not
necessarily entirely disjoint) between two endpoints. There are two
primary motivations for this goal, which themselves provide
functional goals for the design. These are:
Ford, et al. Expires August 7, 2010 [Page 5]
Internet-Draft MPTCP Architecture February 2010
o Improve Throughput: To do this, MPTCP MUST support the use of
multiple paths simultaneously. MPTCP SHOULD NOT reduce the
throughput seen below that of legacy TCP operating on any one of
the paths.
o Improve Resilience: MPTCP MUST support the use of multiple paths
interchangeably for resilience purposes, by permitting packets to
be sent and re-sent on any available path. It follows that, in
the worst case, the protocol MUST be no less resilient than legacy
TCP.
The secondary benefit of resource pooling is that, as MPTCP should be
able to balance traffic among available paths, and respond to
congestion appropriately, network utility should be optimized in a
global sense by shifting load away from congested bottlenecks and
taking advantage of spare capacity wherever it may be located.
To support the goal of resource pooling as presented above, a MPTCP
host must be able to detect and utilise multiple paths. Impacts on
the design of such functions are derived later in Section 3.
2.2. Compatibility Goals
In addition to the functional goals listed above, a Multipath TCP
must meet a number of compatibility goals in order to support
deployment in today's Internet. These goals fall into the following
categories:
2.2.1. Application Compatibility
Application compatibility refers to the appearance of MPTCP to the
application both in terms of the API that can be used and the
expected service model that is provided.
A multipath-capable equivalent of TCP SHOULD retain backward
compatibility with existing APIs, so that existing applications can
use the newer transport merely by upgrading the operating systems of
the end-hosts. This does not preclude the use of an advanced API to
permit multipath-aware applications to specify preferences, nor for
users to configure their systems in a different way from the default,
for example switching on or off the automatic use of MPTCP.
A Multipath TCP MUST follow the same service model as TCP: byte
oriented, in order reliable delivery. To have a deployable protocol,
MPTCP SHOULD adhere to the following "do no harm" philosophy:
multipath TCP SHOULD behave no worse (throughput wise) than running a
single TCP connection over any of its paths.
Ford, et al. Expires August 7, 2010 [Page 6]
Internet-Draft MPTCP Architecture February 2010
2.2.2. Network Compatibility
In terms of compatibility with the network layer, and devices that
operate at the network layer, Multipath TCP MUST remain backward
compatible with the Internet as it exists today, including being able
to traverse predominant existing middleboxes such as firewalls, NATs,
and performance enhancing proxies [7]. This has an effect on
protocol design, in terms of ensuring MPTCP still looks like TCP on
the wire, and uses established TCP extensions where appropriate.
Secondly, this may require the protocol extensions to feature
functionality to allow it to detect and traverse such established
middleboxes.
2.2.3. Compatibility with other network users
As a corollary to both network and application compatibility, the
architecture must enable new Multipath TCP flows to coexist
gracefully with existing legacy TCP flows, competing for bandwidth
neither unduly aggressively or unduly timidly (unless low-precedence
operation is specifically requested by the application, such as with
LEDBAT). The use of multiple paths MUST not significantly harm users
using single path TCP at shared bottlenecks, beyond the impact that
would occur from another single legacy TCP flow.
Furthermore, MPTCP SHOULD feature automatic negotiation of its use.
A host supporting Multipath TCP that requires the other endpoint to
do so too must be able to detect reliably whether this endpoint does
in fact support the next-generation protocol, using it if so, and
otherwise automatically falling back to the legacy protocol.
3. Multipath Architecture
Here we present an architectural view of multipath TCP. The
architecture directly follows the protocol goals as presented above,
and identifies the practical impact that these functional and
compatibility goals will have on the design of the MPTCP solution.
Multipath TCP operates at the transport layer, and its existence
should be transparent to both higher and lower layers. It is a set
of additional features on top of standard TCP, and as such the impact
on applications should be minimal, or entirely transparent
(application considerations are discussed in detail in [5]).
Although the standard TCP API will still be provided to the
application layer, multipath-aware applications would be able to use
an extended sockets API to have further influence on the behaviour of
MPTCP, which is also specified in [5].
Ford, et al. Expires August 7, 2010 [Page 7]
Internet-Draft MPTCP Architecture February 2010
The MPTCP layer relies upon (what appear to the network to be)
standard TCP sessions, termed "subflows", to provide the underlying
transport per path, and as such these retain the network
compatibility desired. MPTCP as described in [3] carries MPTCP-
specific information in a TCP-compatible manner, although this
mechanism is separate from the actual information being transferred
so could evolve in future revisions. Figure 1 illustrates the
layered architecture.
+-------------------------------+
| Application |
+---------------+ +-------------------------------+
| Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+
| IP | | IP | IP |
+---------------+ +-------------------------------+
Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks
Within the new MPTCP layer, a number of functions are provided that
can be identified and, if necessary, implemented separately within a
modular architecture. These functions are those for:
o Path Management: This is the function to detect and use multiple
paths between two endpoints. In the case of the MPTCP design [3],
this feature is implemented using multiple IP addresses at least
one of the endpoints. Although this does not guarantee path
diversity, and there may be shared bottlenecks, this is a simple
mechanism that can be used with no additional features in the
network. The path management features of the MPTCP protocol are
the mechanisms to signal alternative addresses to endpoints, and
mechanisms to set up new subflows attached to an existing MPTCP
connection.
o Packet Scheduling: This function breaks the bytestream received
from the application layer into segments which are transmitted on
one of the available lower (subflow) layers. The MPTCP design
makes use of a data sequence mapping, associating packets sent on
different subflows to a connection-level sequence numbering, thus
allowing packets sent on different subflows to be correctly re-
ordered at the receiver. The packet scheduler is dependent upon
information about the availability of paths exposed by the path
management component, and then makes use of the subflow layers to
transmit these packets.
Ford, et al. Expires August 7, 2010 [Page 8]
Internet-Draft MPTCP Architecture February 2010
o Subflow (single-path TCP) Interface: The subflow layer takes
segments from the packet scheduling component and transmits them
over the specified path, ensuring detectable delivery to the
endpoint. Detection of delivery is necessary to allow the
congestion control protocol to attribute packet delivery or loss
to the right path. Note that the packet scheduling layer does not
embed enough information in packets to allow this to happen:
segments with the same connection-level sequence number can be
transmitted over multiple paths, i.e. as retransmissions or just
to increase redundancy. MPTCP uses TCP at this layer for network
compatibility; TCP ensures in-order, reliable delivery. TCP adds
its of sequence numbers to the segments; these are used to detect
and retransmit lost packets.
o Congestion Control: This function manages congestion control
across the subflows. As specified, this congestion control
algorithm must ensure that a MPTCP connection does not unfairly
take more bandwidth than a single path TCP flow would take at a
shared bottlneck. An algorithm to support this is specified in
[4].
These functions fit together as follows. The Path Management looks
after the discovery (and if necessary, initialisation) of multiple
paths between two endpoints. The Packet Scheduler then receives
packets from the application for the network and does the necessary
operations on them (such as adding a data-level sequence number)
before sending to the subflow layer. The subflow layer adds its own
sequence number, acks, and passes them to network. The receiving
subflow re-orders data and passes it to the multipath layer, which
performs connection level re-ordering, removes the segment boundaries
and sends it to the application. Finally, the congestion control
component exists as part of the packet scheduling, in order to
schedule which packets should be sent at what rate on which subflow.
3.1. Decomposing Transport Functions
This section provides a generic view of the above functional
separation, presenting an extensible model by which transport layer
functions can be analysed and developed in a modular fashion.
As shown in Figure 2, we first loosely separate functions within
transports into "application-oriented" and "network-oriented" parts.
We use this separation of functions as an architectural framework
that a multipath transport must recognize, primarily to maintain
backward compatibility with applications and with the network. The
desire for network compatibility will impact design choices at the
subflow level, while the need for application compatibility will
primarily impact design choices at the higher, application-facing
Ford, et al. Expires August 7, 2010 [Page 9]
Internet-Draft MPTCP Architecture February 2010
MPTCP layer.
The top application-oriented "Semantic" functions are whatever
communication abstractions are to be made available to applications,
including providing the end-to-end reliability and ordering
properties of abstractions like TCP's byte streams or SCTP's message-
based multi-streams; these functions essentially deal with concerns
of application-visible semantics.
We consider the bottom part "network-oriented" because they represent
functions that, while traditionally located in the ostensibly "end-
to-end" Transport Layer, have proven in practice to be of great
concern to network operators and the middleboxes they deploy in the
network to enforce network usage policies [8][9] or optimize
communication performance [10]. The network-oriented functions
include congestion control and other performance-management functions
("Flow" performance functions), and endpoint/service identification
functions (e.g., port numbers) that network operators and their
middleboxes require to enforce network access and security policies
("Endpoint" functions). These network-oriented transport functions
are collectively labeled in figure Figure 2 as "Flow/Endpoint"
functions.
+-----------------+
| Application |
+---------------+ ---> +-----------------+
| Application | / | Semantic | (Application-Oriented
+---------------+ <-- | Functions | Functions)
| Transport | |- - - - - -|
+---------------+ <-- | Flow / Endpoint | (Network-Oriented
| Network | \ | Functions | Functions)
+---------------+ ---> +-----------------+
| Network |
+-----------------+
Figure 2: Decomposition of Transport Functions
Following from the discussion above, a multipath transport would have
to manage Flow/Endpoint functions for every path in an end-to-end
connection, while providing a transparent single interface to the
application. In keeping with this architectural worldview, MPTCP
divides the Transport Layer into two components: the MPTCP part,
which is responsible for the Semantic functions of global ordering of
application data and reliability; and the "legacy TCP" part, which
implements the Flow/Endpoint functions. The figure below shows how
MPTCP implements this architecture:
Ford, et al. Expires August 7, 2010 [Page 10]
Internet-Draft MPTCP Architecture February 2010
+--------------------------+ +-------------------------+
| Application | | Application |
+--------------------------+ +-------------------------+
| Semantic | | MPTCP |
|- - - - - - - - - | + - - - - - + - - - - - +
| Flow/Endpt | Flow/Endpt | | TCP | TCP |
+--------------------------+ +-------------------------+
| Network | Network | | IP | IP |
+--------------------------+ +-------------------------+
Figure 3: Mapping Transport Architecture to MPTCP
4. High-Level Design Decisions
There is seemingly a wide range of choices when designing a multipath
extension to TCP. However, the goals as discussed earlier in this
document constrain the possible solutions, leaving relative little
choice in many areas. Here, we outline high-level design choices
derived from the architectural requirements, and their implications
for complete protocol design.
4.1. Sequence Numbering
MPTCP uses two layers of sequence spaces: a connection level sequence
number, and another sequence number for each subflow. This permits
connection-level segmentation and reassembly, and retransmission of
the same part of connection-level sequence space on different
subflow-level sequence space.
The alternative approach would be to use a single connection level
sequence number, which gets sent on multiple subflows. This has two
problems: first, the individual subflows will appear to the network
as TCP sessions with gaps in the sequence space; this in turn may
upset certain middleboxes such as intrusion detection systems, or
certain transparent proxies, and would go against the network
compatibility goal. Second, the sender cannot attribute packet
losses or receptions to the correct path when the same packet is sent
on multiple paths, in the case of retransmissions.
The sender must be able to tell the receiver how to reorder the data,
for delivery to the application. The sender does so by telling the
receiver how subflow-level data (carying subflow sequence numbers)
maps at connection level, which we refer to as Data Sequence Mapping.
This mapping takes the form (data seq, subflow seq, length), i.e. for
a given number of bytes (the length), the subflow sequence space
beginning at the given sequence number maps to the connection-level
sequence space (beginning at the given data seq number).
Ford, et al. Expires August 7, 2010 [Page 11]
Internet-Draft MPTCP Architecture February 2010
This architecture does not mandate a mechanism for signalling such
information, and it could conceivably have various sources.
One option would be to use existing fields in the TCP segment (such
as subflow seqno, length) and only add the data sequence number to
each segment, for instance as a TCP option. This is, however,
vulnerable to middleboxes that resegment or assemble data, since
there is no specified behaviour for coalescing TCP options. If one
signalled (data seqno, length), this would still be vulnerable to
middleboxes that coalesce segments and do not correctly coalesce the
options. Because of these potential issues, the current
specification of MPTCP mandates that the full mapping should be sent
to the other end.
To reduce the overhead, it would be permissable for the mapping to be
sent periodically and cover more than a single segment. It could
also be excluded entirely in the case of a connection before more
than one subflow is used, where the data-level and subflow-level
sequence space is the same.
4.2. Reliability
MPTCP uses the data sequence mapping and subflow ACKs to decide when
a connection-level segment was received. There are currently no
connection-level acks; this decision was made to reduce network
overheads. This has certain implications on end-to-end semantics.
It means that, once a packet is acked at subflow level it cannot be
discarded in the re-order buffer at the connection level.
Connection-level MPTCP ACKs are not cumulative, as in TCP. As such,
the emergent behaviour is different from standard TCP, where the
receiver can simply drop out-of-order segments if needed (for
instance, due to memory pressure).
It is possible to conceive of some cases where not adding data-level
acks could be detrimental to robustness. Consider a subflow
traversing a transparent proxy; if the proxy acks a segment and then
crashes, the sender will not retransmit the lost segment on another
subflow, as it thinks the segment has been received. The connection
grinds to a halt despite having other working subflows, and the
sender would be unable to determine the cause of the problem. To
deal with this case we are considering adding "informative" data-
level acks.
Regarding retransmissions, it must be possible for a packet to be
retransmitted on a different subflow to that on which it was
originally sent. This is one of MPTCP's core goals, in order to
maintain integrity during temporary or permanent subflow failure, and
this is enabled by the dual sequence number space.
Ford, et al. Expires August 7, 2010 [Page 12]
Internet-Draft MPTCP Architecture February 2010
The scheduling of retransmissions will have significant impact on
MPTCP user experience. The current MPTCP specification suggests that
data outstanding on subflows that have timed out should be
rescheduled for transmission on different subflows. This behaviour
aims to minimize disruption when a path breaks, and uses the first
timeout as indicators. More conservative versions would be to use
second or third timeouts for the same packet.
When packet loss is detected and corrected with fast retransmit,
retransmission on different subflows may still be desirable in
certain cases, for instance to reduce the receive buffer
requirements. However, the lost packets MUST still be sent on the
path that lost them (this is dictated by our network compatiblity
goal), so throughput will be wasted. It is unclear at this point
what the optimal retransmit strategy is.
4.3. Buffers
Receive Buffer: ideally, a subflow failing should not affect the
throughput of other working subflows. However, the receive buffer
has limited size: if a flow times out, the other subflows will
quickly fill the receive buffer with out-of-order data, and will
stall. Hence, receive buffer sizing is important for both robustness
and throughput.
The smallest receive buffer we need to avoid stalling under any
circumstances is max(RTO)*sum(BW). This is, for most multipath
connections, too expensive. A more reasonable size is proportional
to max(RTT)*sum(BW) which ensures subflows don't stall when fast
retransmit works. Also, depending on how the implementation behaves,
an additional sum(RTT*BW) might be needed for the individual re-order
buffers of the TCP subflows.
Send Buffer: the smallest send buffer we need is sum(BDP) across all
paths; this is to hold data until it's acked at subflow level. If we
didn't use a subflow level ack, and relied on a data-level ack, the
send buffer would need to be as big as the receive buffer of the
connection, max(RTT)*sum(BW). In practice, the senders will be web
servers and receivers will be desktops or mobile servers. The send
buffer size matters particularly for servers, which must be able to
maintain a large number of ongoing connections.
4.4. Signalling
Since MPTCP will use regular TCP streams as its transport mechanism,
a MPTCP connection will also begin as a single TCP stream.
Nevertheless, it must signal to the peer that it supports MPTCP and
wishes to use it on this connection. As such, a TCP Option will be
Ford, et al. Expires August 7, 2010 [Page 13]
Internet-Draft MPTCP Architecture February 2010
used to transmit this information, since this is the established
mechanism for indicating additional functionality on a TCP session.
On top of this, however, is signalling required during the operation
of an MPTCP session, such as that for reassembly for multiple
subflows, and for informing the other endpoint about potential other
available addresses. It is not mandated by the architecture in what
format this signalling should be transmitted.
The current MPTCP protocol proposal suggests the use of TCP options
for this signalling, however another approach would be to embed such
information in the payload, and use type-length-value (TLV) encoding
to separate signalling and payload data.
4.5. Path Management
Currently, the network does not expose multiple paths between
endpoints. Multipath TCP will use multiple addresses at one or both
endpoints to get different paths to the destination. The hope is
that these paths, whilst not necesarily entirely non-overlapping,
will be sufficiently disjoint to allow multipath achieve improved
throughput and robustness.
Multiple different (source, destination) address pairs will thus be
used as path selectors.
For increased chance of successfully setting up additional subflows
(such as when one end is behind a firewall, NAT, or other restrictive
middlebox), either endpoint should be able to add new subflows to a
MPTCP connection.
The modularity of path management will permit alternative mechanisms
to be employed if appropriate in the future.
4.6. Connection Identification
Since an MPTCP connection may not be bound to a traditional 5-tuple
(source addr and port, destination addr and port, protocol number)
for the entirity of its existance, it is desirable to provide a new
mechanism for connection identification. This will be useful for
MPTCP-aware applications, and for the MPTCP implementation (and
MPTCP-aware middleboxes) to have a unique identifier with which to
associate the multiple subflows.
Therefore, each MPTCP connection should have a connection identifier
at each endpoint, which is locally unique within that endpoint. This
is analogous to a port number in regular TCP. The manifestation and
purpose of such an identifier is out of the scope of this
Ford, et al. Expires August 7, 2010 [Page 14]
Internet-Draft MPTCP Architecture February 2010
architecture document.
For legacy applications, however, a MPTCP connection will be
identified by the 5-tuple of the first TCP subflow. [TBD: This will
continue to be the case even if that subflow closes / even if an
address disappears / the connection will close in that case unless
the extended API has been used / etc].
4.7. Network Layer Compatibility
MPTCP's modifications remain at the TCP layer, although some
knowledge of the underlying IP layer is required. MPTCP MUST work
with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection may
operate over both IPv4 and IPv6 networks.
4.8. Congestion Control
As already documented in network-layer compatibility requirements,
the congestion control algorithms used by an MPTCP implementation
must not harm other legacy users on shared bottlenecks. To achieve
this, the congestion control algorithms on use on each subflow must
be coupled in some way - a proposal for this is given in [4].
5. Summary
This document has provided a summary of the components that have been
identified to provide a Multipath TCP solution, and described the
high-level design decisions that have been used as a basis of the
MPTCP specification.
The suite of drafts that specify a complete MPTCP implementation, on
top of this architectural overview, are as follows:
o A specification of the MPTCP protocol [3], describing the on- and
off-the-wire differences to regular TCP.
o A specification of a coupled congestion control algorithm [4],
that can be applied to the above protocol while meeting the goals
for such an algorithm as specified in this document.
o A document [5] that builds upon the application compatibility
issues discussed in this document, explaining in more detail what
if any changes an application may experience through the use of
MPTCP. This document also provides a proposed API through which
an application can influence the behaviour of the MPTCP protocol,
as specified in the above drafts.
Ford, et al. Expires August 7, 2010 [Page 15]
Internet-Draft MPTCP Architecture February 2010
6. Security Considerations
Please see [11] for a threat analysis of Multipath TCP. The threats
analysed in this companion document are addressed as appropriate in
the protocol design [3].
7. Interactions with Applications
Interactions with applications - incuding, but not limited to,
performances changes that may be expected, semantic changes, and new
features that may be requested of an API, are presented in [5].
8. Interactions with Middleboxes
TBD?
List of issues that may arise with NATs, firewalls, proxies, etc?
This will be an overview only, and protocol-specific solutions to
this will be given in the companion docments.
(Not sure we really need this section any more)
9. Acknowledgements
Alan Ford, Costin Raiciu and Sebastien Barre are supported by Trilogy
(http://www.trilogy-project.org), a research project (ICT-216372)
partially funded by the European Community under its Seventh
Framework Program. The views expressed here are those of the
author(s) only. The European Commission is not liable for any use
that may be made of the information in this document.
10. IANA Considerations
None.
11. References
11.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
Ford, et al. Expires August 7, 2010 [Page 16]
Internet-Draft MPTCP Architecture February 2010
11.2. Informative References
[2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981.
[3] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for
Multipath Operation with Multiple Addresses",
draft-ford-mptcp-multiaddressed-02 (work in progress),
October 2009.
[4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
Aware Congestion Control", draft-raiciu-mptcp-congestion-00
(work in progress), October 2009.
[5] Scharf, M. and A. Ford, "MPTCP Application Interface
Considerations", draft-scharf-mptcp-api-00 (work in progress),
October 2009.
[6] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource
Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52,
October 2008,
<http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.
[7] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues",
RFC 3234, February 2002.
[8] Srisuresh, P. and K. Egevang, "Traditional IP Network Address
Translator (Traditional NAT)", RFC 3022, January 2001.
[9] Freed, N., "Behavior of and Requirements for Internet
Firewalls", RFC 2979, October 2000.
[10] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Mitigate
Link-Related Degradations", RFC 3135, June 2001.
[11] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
TCP", draft-bagnulo-mptcp-threat-00 (work in progress),
October 2009.
[12] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999.
Appendix A. Implementation Architecture
This section provides suggestions for an architecture to implement an
extensible, modular multipath transport protocol.
Ford, et al. Expires August 7, 2010 [Page 17]
Internet-Draft MPTCP Architecture February 2010
A.1. Functional Separation
This section describes a generic view of the internal implementation
of a Multipath TCP, through which the technical components specified
in the companion documents can fit together. It shows how an
implementation could be built that permits extensibility between
components without changing the external representation.
We first show the functional decomposition of an MPTCP solution that
is completely contained in the transport layer. That solution is
described in more details in [3]. Then we generalize the approach to
allow good extensibility of that solution.
A.1.1. Application to default MPTCP protocol
Although, in the default approach, MPTCP is fully contained in the
transport layer, it can still be divided into two main modules. One
manages the scheduling of packets as well as congestion control. The
other one manages the control of paths. The interface between the
two is dealt with thanks to a Path Index. As shown in Figure 4, the
Path Manager announces to the MultiPath Scheduler what paths can be
used trough path indices, and maintains the mapping between that
value and the particular action that it must apply to use the path
(an example of such a mapping is in Table 1). In the case of the
built-in Path Manager, the action is to replace an address/port pair
with another one, in such a way that another path is used across the
Internet to forward that packet.
Ford, et al. Expires August 7, 2010 [Page 18]
Internet-Draft MPTCP Architecture February 2010
Control plane <-- | --> Data plane
+---------------------------------------------------------------+
| Multipath Scheduler (MPS) |
+---------------------------------------------------------------+
^ | |
| | [A1,B1,|pA1,pB1]
|For conn_id | |
|<A1,B1,pA1,pB1> | +-------------+
|Paths 1->4 can be | | Data packet |<--Path idx:3
|used. | +-------------+ attached
| | | by MPS
| | V
+--------------------------------------------\------------------+
| Path Manager (PM) \[A1,B1]->[A1,B2] |
+--------------------------------------------------\------------+
/ \ | \
/-----------------------------\ | /"\ /"\ /"\ /"\
| rewriting table: || | | | | | | | |
| Subflow id <--> network_id || | | | | | | | |
| || | | | | | | | |
| [see table below] || | | | | | | | |
| || \./ \./ \./ \./
+------------------------------+| path1 path2 path3 path4
Figure 4: Functional separation of MPTCP in the transport layer
The MultiPath Scheduler only deals with abstract paths, represented
by numbers. It only sees one address pair throughout the
communication, that we call the connection identifier. However, the
MultiPath Scheduler must be able to perform per-subflow congestion
control, and thus to distinguish between the subflows. This leads to
define a subflow identifier, that consists of the usual transport
identifier extended with the path index:
<addr_src,psrc,addr_dst,pdst,path_index>. The following options,
described in [3], are managed by the MultiPath Scheduler.
o MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP.
Note that the MPC option also holds a token, which is necessary
only if the built-in Path Manager is used. In the next section we
describe the generalized case, where the token can be ignored by
the receiver if another path manager is used.
o DATA SEQUENCE NUMBER (DSN): Identify the position of a set of
bytes in the meta-flow.
o DATA FIN (DFIN): Terminate a meta-flow.
Ford, et al. Expires August 7, 2010 [Page 19]
Internet-Draft MPTCP Architecture February 2010
An implementation MUST use those options even if another Path Manager
than the default one is implemented.
The Path manager applies a particular technology to give the MPS the
possibility to use several paths. The built-in MPTCP Path Manager
uses multiple IPv4 addresses as its mean to influence the forwarding
of packets through the Internet.
When the MPS starts a new connection, the PM chooses a token that
will be used to identify the connection. This is necessary to allow
the PM applying the correct path index to incoming packets. An
example mapping table is given hereafter:
+-----------------+---------------+---------+-----------------+
| connection id | subflow id | token | Network id |
+-----------------+---------------+---------+-----------------+
| <A1,B1,pA1,pB1> | <conn_id,pi1> | token_1 | <A1,B1,pA1,pB1> |
| <A1,B1,pA1,pB1> | <conn_id,pi2> | token_1 | <A2,B2,pA1,pB2> |
| <A1,B1,pA1,pB1> | <conn_id,pi3> | token_1 | <A1,B2,pA1,pB2> |
| <A1,B1,pA1,pB1> | <conn_id,pi4> | token_1 | <A2,B1,pA1,pB1> |
| <A1,B1,pA1,pB3> | <conn_id,pi1> | token_2 | <A1,B1,pA1,pB3> |
| <A1,B1,pA1,pB3> | <conn_id,pi2> | token_2 | <A2,B1,pA1,pB3> |
+-----------------+---------------+---------+-----------------+
Table 1: Example mapping table for built-in PM
Table 1 shows an example where two connections are ongoing. One is
identified by token_1, the other one with token_2. Since addresses
are rewritten by the path manager, the attachment to the right
connection is achieved thanks to the token, which is used at
connection establishment and subflow establishment. It is then
remembered. The first column holds the information that is exposed
to the applications, while the last column shows the information that
is actually written in packets that will fly through the network. We
note that additionnally to the addresses, ports can be rewritten,
which contributes to supporting NATs. The table also shows the role
of the token, which is to attach various combinations of ports and
addresses to a single connection. The token is specific to the
built-in path manager, and can be ignored if another path manager is
used. An implementation of the built-in path manager MUST implement
the following options (defined in more details in [3]):
o Add Address (ADDR): Announce a new address we own
o Remove Addresse (REMADDR): Withdraw a previously announced address
o Join Connection (JOIN): Attach a new subflow to the current
connection
Ford, et al. Expires August 7, 2010 [Page 20]
Internet-Draft MPTCP Architecture February 2010
Those options form the default MPTCP Path Manager, based on declaring
IP addresses, and carries control information in TCP options. An
implementation of Multipath TCP can use any Path Manager, but it MUST
be able to fallback to the default PM in case the other end does not
support the custom PM. Alternative Path Managers may be specified in
separate documents in the future.
A.1.2. Generic architecture for MPTCP
Now that the functional decomposition has been shown for MPTCP with
the built-in Path Manager, we show how that architecture can be
generalized to allow the implementation of other Path Managers for
MPTCP. A general overview of the architecture is provided in
Figure 5. The Multipath Scheduler (MPS) learns about the number of
available paths through notifications received from the Path Manager
(PM). From the point of view of the Multipath Scheduler, a path is
just a number, called a Path Index. Notifications from the PM to the
MPS MAY contain supporting information about the paths, if relevant,
so that the MPS can make more intelligent decisions about where to
route traffic. When the Multipath Scheduler initiates a
communication to a new host, it can only send the packets to the
default path. But since the Path manager is layered below the MPS,
it can detect that a new communication is happening, and tell the MPS
about the other paths it knows about.
Ford, et al. Expires August 7, 2010 [Page 21]
Internet-Draft MPTCP Architecture February 2010
Control plane <-- | --> Data plane
+---------------------------------------------------------------+
| Multipath Scheduler (MPS) |
+---------------------------------------------------------------+
^ | |
| | [A1,B1,|pA1,pB1]
| | |
|Announcing new | +-------------+
|paths. (referred | | Data packet |<--Path idx:3
|to as path indices) | +-------------+ attached
| | | by MPS
| | V
+--------------------------------------------\------------------+
| Path Manager (PM) \__________zzzzz |
+--------------------------------------------------------\------+
/ \ | \
/---------------------------\ | /"\ /"\ /"\
| subflow_id Action | | | | | | | |
|<A1,B1,pA1,pB1,1> xxxxx | | | | | | | |
|<A1,B1,pA1,pB1,2> yyyyy | | \./ \./ \./
|<A1,B1,pA1,pB1,3> zzzzz | | path1 path2 path3
+---------------------------+
Figure 5: Overview of MPTCP architecture
From then on, it is possible for the MPS to attach a Path Index to
the control structure of its packets (internal to the MPTCP
implementation), so that the Path Manager can map this Path Index to
the corresponding action. (see table in the lower left part of
Figure 5). The particular action depends on the network mechanism
used to select a path. Examples are address rewriting, tunnelling or
setting a path selector value inside the packet.
The applicability of the architecture is not limited to the MPTCP
protocol. While we define in this document an MPTCP MPS (MPTCP
Multipath Scheduler), other Multipath Schedulers can be defined. For
example, if an appropriate socket interface is designed, applications
could behave as a Multipath Scheduler and decide where to send any
particular data. In this document we concentrate on the MPTCP case,
however.
A.2. PM/MPS interface
The minimal set of requirement for a Path Manager is as follows:
o Outgoing untagged packets: Any outgoing packet flowing through the
Path Manager is either tagged or untagged (by the MPS) with a path
index. If it is untagged, the packet is sent normally to the
Ford, et al. Expires August 7, 2010 [Page 22]
Internet-Draft MPTCP Architecture February 2010
Internet, as if no multi-path support were present. Untagged
packets can be used to trigger a path discovery procedure, that
is, a Path Manager can listen to untagged packets and decide at
some time to find if any other path than the default one is
useable for the corresponding host pair. Note that any other
criteria could be used to decide when to start discovering
available paths. Note also that MPS scheduling will not be
possible until the Path Manager has notified the available paths.
The PM is thus the first entity coming into action.
o Outgoing tagged packets: The Path Manager maintains a table
mapping path indices to actions. The action is the operation that
allows using a particular path. Examples of possible actions are
route selection, interface selection or packet transformation.
When the PM sees a packet tagged with a path index, it looks up
its table to find the appropriate action for that packet. The tag
is purely local. It is removed before the packet is transmitted.
o Incoming packets: A Path Manager MUST ensure that each incoming
path is mapped unambiguously to exactly one outgoing path. Note
that this requirement implies that the same number of incoming/
outgoing paths must be established. Moreover, a PM MUST tag any
incoming path with the same Path Index as the one used for the
corresponding outgoing path. This is necessary for MPTCP to know
what outgoing path is acknowledged by an incoming packet.
o Module interface: A PM MUST be able to notify the MPS about the
number of available paths. Such notifications MUST contain the
path indices that are legal for use by the MPS. In case the PM
decides to stop providing service for one path, it MUST notify the
MPS about path removal. Additionnaly, a PM MAY provide
complementary path information when available, such as link
quality or preference level.
Authors' Addresses
Alan Ford (editor)
Roke Manor Research
Old Salisbury Lane
Romsey, Hampshire SO51 0ZN
UK
Phone: +44 1794 833 465
Email: alan.ford@roke.co.uk
Ford, et al. Expires August 7, 2010 [Page 23]
Internet-Draft MPTCP Architecture February 2010
Costin Raiciu
University College London
Gower Street
London WC1E 6BT
UK
Email: c.raiciu@cs.ucl.ac.uk
Sebastien Barre
Universite catholique de Louvain
Pl. Ste Barbe, 2
Louvain-la-Neuve 1348
Belgium
Phone: +32 10 47 91 03
Email: sebastien.barre@uclouvain.be
Janardhan Iyengar
Franklin and Marshall College
Mathematics and Computer Science
PO Box 3003
Lancaster, PA 17604-3003
USA
Phone: 717-358-4774
Email: jiyengar@fandm.edu
Bryan Ford
Max Planck Institute for Software Systems
Saarbrucken,
Germany
Email: baford@mpi-sws.org
Ford, et al. Expires August 7, 2010 [Page 24]