draft-ietf-mptcp-architecture-00

Internet Engineering Task Force                             A. Ford, Ed.
Internet-Draft                                       Roke Manor Research
Intended status: Informational                                 C. Raiciu
Expires: September 1, 2010                     University College London
                                                                S. Barre
                                                Universite catholique de
                                                                 Louvain
                                                              J. Iyengar
                                           Franklin and Marshall College
                                                       February 28, 2010


         Architectural Guidelines for Multipath TCP Development
                    draft-ietf-mptcp-architecture-00

Abstract

   Endpoints are often connected by multiple paths, but TCP restricts
   communications to a single path per transport connection.  Resource
   usage within the network would be more efficient were these multiple
   paths able to be used concurrently.  This should enhance user
   experience through improved resilience to network failure and higher
   throughput.

   This document outlines architectural guidelines for the development
   of a Multipath Transport Protocol, with references to how these
   architectural components come together in the Multipath TCP (MPTCP)
   protocol.  This document also lists certain high level design
   decisions that provide foundations for the MPTCP design, based upon
   these architectural requirements.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.



Ford, et al.            Expires September 1, 2010               [Page 1]


Internet-Draft             MPTCP Architecture              February 2010


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 1, 2010.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.
































Ford, et al.            Expires September 1, 2010               [Page 2]


Internet-Draft             MPTCP Architecture              February 2010


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
     1.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
     1.3.  Reference Scenario . . . . . . . . . . . . . . . . . . . .  5
   2.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     2.1.  Functional Goals . . . . . . . . . . . . . . . . . . . . .  5
     2.2.  Compatibility Goals  . . . . . . . . . . . . . . . . . . .  6
       2.2.1.  Application Compatibility  . . . . . . . . . . . . . .  6
       2.2.2.  Network Compatibility  . . . . . . . . . . . . . . . .  7
       2.2.3.  Compatibility with other network users . . . . . . . .  8
   3.  An Architectural Basis For MPTCP . . . . . . . . . . . . . . .  8
   4.  A Functional Decomposition of MPTCP  . . . . . . . . . . . . . 10
   5.  High-Level Design Decisions  . . . . . . . . . . . . . . . . . 11
     5.1.  Sequence Numbering . . . . . . . . . . . . . . . . . . . . 12
     5.2.  Reliability  . . . . . . . . . . . . . . . . . . . . . . . 13
     5.3.  Buffers  . . . . . . . . . . . . . . . . . . . . . . . . . 14
     5.4.  Signalling . . . . . . . . . . . . . . . . . . . . . . . . 14
     5.5.  Path Management  . . . . . . . . . . . . . . . . . . . . . 15
     5.6.  Connection Identification  . . . . . . . . . . . . . . . . 15
     5.7.  Network Layer Compatibility  . . . . . . . . . . . . . . . 16
     5.8.  Congestion Control . . . . . . . . . . . . . . . . . . . . 16
   6.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16
   8.  Interactions with Applications . . . . . . . . . . . . . . . . 17
   9.  Interactions with Middleboxes  . . . . . . . . . . . . . . . . 17
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
   11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 18
   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     13.1. Normative References . . . . . . . . . . . . . . . . . . . 18
     13.2. Informative References . . . . . . . . . . . . . . . . . . 18
   Appendix A.  Implementation Architecture . . . . . . . . . . . . . 19
     A.1.  Functional Separation  . . . . . . . . . . . . . . . . . . 19
       A.1.1.  Application to default MPTCP protocol  . . . . . . . . 19
       A.1.2.  Generic architecture for MPTCP . . . . . . . . . . . . 22
     A.2.  PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 23
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24












Ford, et al.            Expires September 1, 2010               [Page 3]


Internet-Draft             MPTCP Architecture              February 2010


1.  Introduction

   As the Internet evolves, demands on Internet resources are ever-
   increasing, but often these resources (in particular, bandwidth)
   cannot be fully utilised due to protocol constraints both on the end-
   systems and within the network.  If these resources could instead be
   used concurrently, end user experience could be greatly improved.
   Such enhancements would also reduce the necessary expenditure on
   network infrastructure which would otherwise be needed to create an
   equivalent improvement in user experience.

   By the application of resource pooling[2], these available resources
   can be 'pooled' such that they appear as a single logical resource to
   the user.  The purpose of a multipath transport, therefore, is to
   make use of multiple available paths, through resource pooling, to
   bring two key benefits:

   o  To increase the resilience of the connectivity by providing
      multiple paths, protecting end hosts from the failure of one.

   o  To increase the efficiency of the resource usage, and thus
      increase the network capacity available to end hosts.

   Multipath TCP (MPTCP)[3] is a set of extensions for TCP[4] that
   implements a multipath transport and achieves these goals by pooling
   multiple paths within a transport connection, transparent to the
   application.  While multihoming and multipath functions have been
   implemented in transport protocols previously, notably SCTP[5], MPTCP
   is distinct in recognizing application and network compatibility
   goals that we believe are important for deployability of a multipath
   transport; we discuss these goals in more detail later in Section 2.

   This document makes three contributions: (i) it describes goals for a
   multipath transport - goals that MPTCP is designed to meet; (ii) it
   lays out an architectural basis for MPTCP's design - a discussion
   that applies to other multipath transports as well; and (iii) it
   discusses and documents high-level design decisions made in MPTCP's
   development, and considers their implications.

   Companion documents to this architectural overview are those which
   provide details of the protocol extensions[3], congestion control
   algorithms[6], and application-level considerations[7].  Put
   together, these components specify a complete Multipath TCP design.
   We note that specific components are replaceable with other protocols
   in accordance with the layer and functional decompositions discussed
   in this document.

   Please note this document is a work-in-progress and covers several



Ford, et al.            Expires September 1, 2010               [Page 4]


Internet-Draft             MPTCP Architecture              February 2010


   topics, some of which may be more appropriately moved to separate
   documents as this work evolves.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [1].

1.2.  Terminology

   Path:  A sequence of links between a sender and a receiver, defined
      in this context by a source and destination address pair.

   Endpoint:  A host either initiating or terminating a MPTCP
      connection.

   Multipath TCP (MPTCP):  A modified version of the TCP [4] protocol
      that supports the simultaneous use of multiple paths between
      endpoints.

   Subflow:  A flow of TCP packets operating over an individual path,
      which forms part of a larger MPTCP connection.

   MPTCP Connection:  A set of one or more subflows combined to provide
      a single Multipath TCP service to an application at an endpoint.

1.3.  Reference Scenario

   TBD - would this be useful?

   Endpoints, routes.  Addresses/path selection mechanisms?


2.  Goals

   This section outlines primary goals that Multipath TCP aims to meet.
   These are broadly broken down into functional goals, which steer
   services and features that MPTCP must provide, and compatibility
   goals, which determine how MPTCP should appear to entities that
   interact with it.

2.1.  Functional Goals

   In providing the use of multiple paths, MPTCP has the following two
   functional goals.





Ford, et al.            Expires September 1, 2010               [Page 5]


Internet-Draft             MPTCP Architecture              February 2010


   o  Improve Throughput: MPTCP MUST support the concurrent use of
      multiple paths.  To meet the minimum performance incentives for
      deployment, an MPTCP connection over multiple paths SHOULD achieve
      no lesser throughput than a single TCP connection over the best
      constituent path.

   o  Improve Resilience: MPTCP MUST support the use of multiple paths
      interchangeably for resilience purposes, by permitting packets to
      be sent and re-sent on any available path.  It follows that, in
      the worst case, the protocol MUST be no less resilient than legacy
      TCP.

   As distribution of traffic among available paths and responses to
   congestion are done in accordance with resource pooling
   principles[2], a secondary effect of meeting these goals is that
   widespread use of MPTCP over the Internet should optimize overall
   network utility by shifting load away from congested bottlenecks and
   by taking advantage of spare capacity wherever possible.

   Furthermore, MPTCP SHOULD feature automatic negotiation of its use.
   A host supporting Multipath TCP that requires the other endpoint to
   do so too must be able to detect reliably whether this endpoint does
   in fact support the next-generation protocol, using it if so, and
   otherwise automatically falling back to the legacy protocol.

2.2.  Compatibility Goals

   In addition to the functional goals listed above, a Multipath TCP
   must meet a number of compatibility goals in order to support
   deployment in today's Internet.  These goals fall into the following
   categories:

2.2.1.  Application Compatibility

   Application compatibility refers to the appearance of MPTCP to the
   application both in terms of the API that can be used and the
   expected service model that is provided.

   MPTCP MUST follow the same service model as TCP [4]: in-order,
   reliable, and byte-oriented delivery.  Furthermore, an MPTCP
   connection SHOULD provide the application with no worse throughput
   than it would expect from running a single TCP connection over any
   one of its available paths.

   A multipath-capable equivalent of TCP SHOULD retain backward
   compatibility with existing TCP APIs, so that existing applications
   can use the newer transport merely by upgrading the operating systems
   of the end-hosts.  This does not preclude the use of an advanced API



Ford, et al.            Expires September 1, 2010               [Page 6]


Internet-Draft             MPTCP Architecture              February 2010


   to permit multipath-aware applications to specify preferences, nor
   for users to configure their systems in a different way from the
   default, for example switching on or off the automatic use of MPTCP.

2.2.2.  Network Compatibility

   Traditional Internet architecture slots network devices in the
   network layer and lower layers of the OSI 7-layer stack, where the
   layers above the network layer - the transport layer and upper layers
   - are instantiated only at the end-hosts.  While this architecture,
   shown in Figure 1, was largely adhered to earlier, this layering no
   longer reflects the "ground truth" in the Internet with the
   proliferation of middleboxes[8].  Middleboxes routinely interpose on
   the transport layer; sometimes even completely terminating transport
   connections, thus leaving the application layer as the first real
   end-to-end layer, as shown in Figure 2.

   +-------------+                                       +-------------+
   | Application |<------------ end-to-end ------------->| Application |
   +-------------+                                       +-------------+
   |  Transport  |<------------ end-to-end ------------->|  Transport  |
   +-------------+   +-------------+   +-------------+   +-------------+
   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
   +-------------+   +-------------+   +-------------+   +-------------+
      End Host           Router             Router          End Host

                Figure 1: Traditional Internet Architecture


   +-------------+                                       +-------------+
   | Application |<------------ end-to-end ------------->| Application |
   +-------------+                     +-------------+   +-------------+
   |  Transport  |<------------------->|  Transport  |<->|  Transport  |
   +-------------+   +-------------+   +-------------+   +-------------+
   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
   +-------------+   +-------------+   +-------------+   +-------------+
                                          Firewall,
      End Host           Router         NAT, or Proxy      End Host

                        Figure 2: Internet Reality

   Middleboxes that interpose on the transport layer result in loss of
   "fate-sharing"[9], that is, they often hold "hard" state that, when
   lost or corrupted, results in loss or corruption of the end-to-end
   transport connection.

   MPTCP MUST remain backward compatible with the Internet as it exists
   today, including being able to traverse predominant middleboxes such



Ford, et al.            Expires September 1, 2010               [Page 7]


Internet-Draft             MPTCP Architecture              February 2010


   as firewalls, NATs, and performance enhancing proxies[8].  This
   requirement comes from recognizing middleboxes as a significant
   deployment bottleneck for any transport that is not TCP, and
   constrains MPTCP to appear as TCP does on the wire and to use
   established TCP extensions where necessary.  To ensure end-to-endness
   of the transport, we further require MPTCP to preserve fate-sharing
   without making any assumptions about middlebox behavior.

2.2.3.  Compatibility with other network users

   As a corollary to both network and application compatibility, the
   architecture must enable new Multipath TCP flows to coexist
   gracefully with existing legacy TCP flows, competing for bandwidth
   neither unduly aggressively or unduly timidly (unless low-precedence
   operation is specifically requested by the application, such as with
   LEDBAT).  The use of multiple paths MUST not unduly harm users using
   single path TCP at shared bottlenecks, beyond the impact that would
   occur from another single legacy TCP flow.


3.  An Architectural Basis For MPTCP

   We now present one possible transport architecture that we believe
   can effectively support MPTCP's goals.  The new Internet model
   described here is based on ideas proposed earlier in Tng ("Transport
   next-generation") [10].  While by no means the only possible
   architecture supporting multipath transport, Tng incorporates many
   lessons learned from previous transport research and development
   practice, and offers a strong starting point from which to consider
   the extant Internet architecture and its bearing on the design of any
   new Internet transports or transport extensions.

          +------------------+
          |    Application   |
          +------------------+  ^ Application-oriented transport
          |                  |  | functions (Semantic Layer)
          + - - Transport - -+ ----------------------------------
          |                  |  | Network-oriented transport
          +------------------+  v functions (Flow+Endpoint Layer)
          |      Network     |
          +------------------+
            Existing Layers             Tng Decomposition

              Figure 3: Decomposition of Transport Functions

   Tng loosely splits the transport layer into "application-oriented"
   and "network-oriented" layers, as shown in Figure 3.  The
   application-oriented "Semantic" layer implements functions driven



Ford, et al.            Expires September 1, 2010               [Page 8]


Internet-Draft             MPTCP Architecture              February 2010


   primarily by concerns of supporting and protecting the application's
   end-to-end communication, while the network-oriented "Flow+Endpoint"
   layer implements functions such as endpoint identification (using
   port numbers) and congestion control.  These network-oriented
   functions, while traditionally located in the ostensibly "end-to-end"
   Transport layer, have proven in practice to be of great concern to
   network operators and the middleboxes they deploy in the network to
   enforce network usage policies[11] [12] or optimize communication
   performance[13].  Figure 4 shows how middleboxes interact with
   different layers in this decomposed model of the transport layer: the
   application-oriented layer operates end-to-end, while the network-
   oriented layer operates "segment-by-segment" and can be interposed
   upon by middleboxes.

   +-------------+                                       +-------------+
   | Application |<------------ end-to-end ------------->| Application |
   +-------------+                                       +-------------+
   |  Semantic   |<------------ end-to-end ------------->|  Semantic   |
   +-------------+   +-------------+   +-------------+   +-------------+
   |Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|
   +-------------+   +-------------+   +-------------+   +-------------+
   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
   +-------------+   +-------------+   +-------------+   +-------------+
                        Firewall         Performance
      End Host           or NAT        Enhancing Proxy      End Host

              Figure 4: Middleboxes in the new Internet model

   MPTCP's architectural design follows Tng's decomposition as shown in
   Figure 5.  The MPTCP component, which provides application
   compatibility through the preservation of TCP-like semantics of
   global ordering of application data and reliability, is an
   instantiation of the "application-oriented" Semantic layer; whereas
   the legacy-TCP component, which provides network compatibility by
   appearing and behaving as a TCP flow in network, is an instantiation
   of the "network-oriented" Flow+Endpoint layer.

        +--------------------------+    +-------------------------+
        |      Application         |    |      Application        |
        +--------------------------+    +-------------------------+
        |        Semantic          |    |         MPTCP           |
        |--------------------------|    + - - - - -  +  - - - - - +
        | Flow+Endpt | Flow+Endpt  |    |    TCP     |     TCP    |
        +--------------------------+    +-------------------------+
        |   Network  |   Network   |    |     IP     |     IP     |
        +--------------------------+    +-------------------------+

                      Figure 5: MPTCP mapping to Tng



Ford, et al.            Expires September 1, 2010               [Page 9]


Internet-Draft             MPTCP Architecture              February 2010


   As a protocol extension to TCP, MPTCP thus explicitly acknowledges
   middleboxes in its design, and specifies a protocol that operates at
   two scales: the MPTCP component operates end-to-end, while it allows
   the TCP component to operate segment-by-segment.


4.  A Functional Decomposition of MPTCP

   Having laid out the goals to be met and the architectural basis for
   MPTCP, we now provide a functional decomposition MPTCP's design.

   The MPTCP component relies upon (what appear to the network to be)
   standard TCP sessions, termed "subflows", to provide the underlying
   transport per path, and as such these retain the network
   compatibility desired.  MPTCP as described in [3] carries MPTCP-
   specific information in a TCP-compatible manner, although this
   mechanism is separate from the actual information being transferred
   so could evolve in future revisions.  Figure 6 illustrates the
   layered architecture.

                                   +-------------------------------+
                                   |           Application         |
      +---------------+            +-------------------------------+
      |  Application  |            |             MPTCP             |
      +---------------+            + - - - - - - - + - - - - - - - +
      |      TCP      |            | Subflow (TCP) | Subflow (TCP) |
      +---------------+            +-------------------------------+
      |      IP       |            |       IP      |      IP       |
      +---------------+            +-------------------------------+

      Figure 6: Comparison of Standard TCP and MPTCP Protocol Stacks

   Situated below the application, the MPTCP extension manages multiple
   TCP subflows below it and must implement the following functions:

   o  Path Management: This is the function to detect and use multiple
      paths between two endpoints.  In the case of the MPTCP design [3],
      this feature is implemented using multiple IP addresses at least
      one of the endpoints.  Although this does not guarantee path
      diversity, and there may be shared bottlenecks, this is a simple
      mechanism that can be used with no additional features in the
      network.  The path management features of the MPTCP protocol are
      the mechanisms to signal alternative addresses to endpoints, and
      mechanisms to set up new subflows attached to an existing MPTCP
      connection.

   o  Packet Scheduling: This function breaks the bytestream received
      from the application into segments which are transmitted on one of



Ford, et al.            Expires September 1, 2010              [Page 10]


Internet-Draft             MPTCP Architecture              February 2010


      the available lower subflows.  The MPTCP design makes use of a
      data sequence mapping, associating packets sent on different
      subflows to a connection-level sequence numbering, thus allowing
      packets sent on different subflows to be correctly re-ordered at
      the receiver.  The packet scheduler is dependent upon information
      about the availability of paths exposed by the path management
      component, and then makes use of the subflows to transmit these
      packets.

   o  Subflow (single-path TCP) Interface: A subflow component takes
      segments from the packet-scheduling component and transmits them
      over the specified path, ensuring detectable delivery to the
      endpoint.  Detection of delivery is necessary to allow the
      congestion control protocol to attribute packet delivery or loss
      to the right path.  Note that the packet scheduling component does
      not embed enough information in packets to allow this to happen:
      segments with the same connection-level sequence number can be
      transmitted over multiple paths, i.e. as retransmissions or just
      to increase redundancy.  MPTCP uses TCP underneath for network
      compatibility; TCP ensures in-order, reliable delivery.  TCP adds
      its of sequence numbers to the segments; these are used to detect
      and retransmit lost packets.

   o  Congestion Control: This function manages congestion control
      across the subflows.  As specified, this congestion control
      algorithm must ensure that a MPTCP connection does not unfairly
      take more bandwidth than a single path TCP flow would take at a
      shared bottlneck.  An algorithm to support this is specified in
      [6].

   These functions fit together as follows.  The Path Management looks
   after the discovery (and if necessary, initialisation) of multiple
   paths between two endpoints.  The Packet Scheduler then receives
   packets from the application for the network and does the necessary
   operations on them (such as adding a data-level sequence number)
   before sending to a subflow.  The subflow then adds its own sequence
   number, acks, and passes them to network.  The receiving subflow re-
   orders data and passes it to the MPTCP component, which performs
   connection level re-ordering, removes the segment boundaries and
   sends it to the application.  Finally, the congestion control
   component exists as part of the packet scheduling, in order to
   schedule which packets should be sent at what rate on which subflow.


5.  High-Level Design Decisions

   There is seemingly a wide range of choices when designing a multipath
   extension to TCP.  However, the goals as discussed earlier in this



Ford, et al.            Expires September 1, 2010              [Page 11]


Internet-Draft             MPTCP Architecture              February 2010


   document constrain the possible solutions, leaving relative little
   choice in many areas.  Here, we outline high-level design choices
   that draw from the architectural basis discussed earlier in
   Section 3, and their implications for the MPTCP design.

5.1.  Sequence Numbering

   MPTCP uses two levels of sequence spaces: a connection level sequence
   number, and another sequence number for each subflow.  This permits
   connection-level segmentation and reassembly, and retransmission of
   the same part of connection-level sequence space on different
   subflow-level sequence space.

   The alternative approach would be to use a single connection level
   sequence number, which gets sent on multiple subflows.  This has two
   problems: first, the individual subflows will appear to the network
   as TCP sessions with gaps in the sequence space; this in turn may
   upset certain middleboxes such as intrusion detection systems, or
   certain transparent proxies, and would go against the network
   compatibility goal.  Second, the sender cannot attribute packet
   losses or receptions to the correct path when the same packet is sent
   on multiple paths, in the case of retransmissions.

   The sender must be able to tell the receiver how to reorder the data,
   for delivery to the application.  The sender does so by telling the
   receiver how subflow-level data (carying subflow sequence numbers)
   maps at connection level, which we refer to as Data Sequence Mapping.
   This mapping takes the form (data seq, subflow seq, length), i.e. for
   a given number of bytes (the length), the subflow sequence space
   beginning at the given sequence number maps to the connection-level
   sequence space (beginning at the given data seq number).

   This architecture does not mandate a mechanism for signalling such
   information, and it could conceivably have various sources.

   One option would be to use existing fields in the TCP segment (such
   as subflow seqno, length) and only add the data sequence number to
   each segment, for instance as a TCP option.  This is, however,
   vulnerable to middleboxes that resegment or assemble data, since
   there is no specified behaviour for coalescing TCP options.  If one
   signalled (data seqno, length), this would still be vulnerable to
   middleboxes that coalesce segments and do not correctly coalesce the
   options.  Because of these potential issues, the current
   specification of MPTCP mandates that the full mapping should be sent
   to the other end.

   To reduce the overhead, it would be permissable for the mapping to be
   sent periodically and cover more than a single segment.  It could



Ford, et al.            Expires September 1, 2010              [Page 12]


Internet-Draft             MPTCP Architecture              February 2010


   also be excluded entirely in the case of a connection before more
   than one subflow is used, where the data-level and subflow-level
   sequence space is the same.

5.2.  Reliability

   Under normal behaviour, MPTCP can use the data sequence mapping and
   subflow ACKs to decide when a connection-level segment was received.
   This has certain implications on end-to-end semantics.  It means that
   once a packet is acked at subflow level it cannot be discarded in the
   re-order buffer at the connection level.  Secondly, unlike in
   standard TCP, a receiver cannot simply drop out-of-order segments if
   needed (for instance, due to memory pressure).

   Furthermore, it is possible to conceive of some cases where
   connection-level acknowledgements could improve robustness.  Consider
   a subflow traversing a transparent proxy: if the proxy acks a segment
   and then crashes, the sender will not retransmit the lost segment on
   another subflow, as it thinks the segment has been received.  The
   connection grinds to a halt despite having other working subflows,
   and the sender would be unable to determine the cause of the problem.
   Finally, as an optimisation, it may be feasible for a connection-
   level acknowledgement to be transmitted over the shortest RTT path,
   potentially reducing send buffer requirements (see Section 5.3).

   Therefore, to provide a fully robust multipath TCP solution, MPTCP
   SHOULD feature explicit connection-level acknowledgements.

   Regarding retransmissions, it MUST be possible for a packet to be
   retransmitted on a different subflow to that on which it was
   originally sent.  This is one of MPTCP's core goals, in order to
   maintain integrity during temporary or permanent subflow failure, and
   this is enabled by the dual sequence number space.

   The scheduling of retransmissions will have significant impact on
   MPTCP user experience.  The current MPTCP specification suggests that
   data outstanding on subflows that have timed out should be
   rescheduled for transmission on different subflows.  This behaviour
   aims to minimize disruption when a path breaks, and uses the first
   timeout as indicators.  More conservative versions would be to use
   second or third timeouts for the same packet.

   When packet loss is detected and corrected with fast retransmit,
   retransmission on different subflows may still be desirable in
   certain cases, for instance to reduce the receive buffer
   requirements.  However, in all cases with retransmissions on
   different subflows, the lost packets SHOULD still be sent on the path
   that lost them.  This is currently believed to be necessary to



Ford, et al.            Expires September 1, 2010              [Page 13]


Internet-Draft             MPTCP Architecture              February 2010


   maintain subflow integrity, as per the network compatiblity goal.  By
   doing this, throughput will be wasted, and it is unclear at this
   point what the optimal retransmit strategy is.

5.3.  Buffers

   Receive Buffer: ideally, a subflow failing should not affect the
   throughput of other working subflows.  However, the receive buffer
   has limited size: if a flow times out, the other subflows will
   quickly fill the receive buffer with out-of-order data, and will
   stall.  Hence, receive buffer sizing is important for both robustness
   and throughput.

   The smallest receive buffer we need to avoid stalling under any
   circumstances is max(RTO)*sum(BW).  This is, for most multipath
   connections, too expensive.  A more reasonable size is proportional
   to max(RTT)*sum(BW) which ensures subflows don't stall when fast
   retransmit works.  Also, depending on how the implementation behaves,
   an additional sum(RTT*BW) might be needed for the individual re-order
   buffers of the TCP subflows.

   Send Buffer: the smallest send buffer we need is sum(BDP) across all
   paths; this is to hold data until it's acked at subflow level.  If we
   didn't use a subflow level ack, and relied on a data-level ack, the
   send buffer would need to be as big as the receive buffer of the
   connection, max(RTT)*sum(BW).  In practice, the senders will be web
   servers and receivers will be desktops or mobile servers.  The send
   buffer size matters particularly for servers, which must be able to
   maintain a large number of ongoing connections.

5.4.  Signalling

   Since MPTCP will use regular TCP streams as its transport mechanism,
   a MPTCP connection will also begin as a single TCP stream.
   Nevertheless, it must signal to the peer that it supports MPTCP and
   wishes to use it on this connection.  As such, a TCP Option will be
   used to transmit this information, since this is the established
   mechanism for indicating additional functionality on a TCP session.

   On top of this, however, is signalling required during the operation
   of an MPTCP session, such as that for reassembly for multiple
   subflows, and for informing the other endpoint about potential other
   available addresses.  It is not mandated by the architecture in what
   format this signalling should be transmitted.

   The current MPTCP protocol proposal suggests the use of TCP options
   for this signalling, however another approach would be to embed such
   information in the payload, and use type-length-value (TLV) encoding



Ford, et al.            Expires September 1, 2010              [Page 14]


Internet-Draft             MPTCP Architecture              February 2010


   to separate signalling and payload data.

5.5.  Path Management

   Currently, the network does not expose multiple paths between
   endpoints.  Multipath TCP will use multiple addresses at one or both
   endpoints to get different paths to the destination.  The hope is
   that these paths, whilst not necesarily entirely non-overlapping,
   will be sufficiently disjoint to allow multipath achieve improved
   throughput and robustness.

   Multiple different (source, destination) address pairs will thus be
   used as path selectors.

   For increased chance of successfully setting up additional subflows
   (such as when one end is behind a firewall, NAT, or other restrictive
   middlebox), either endpoint should be able to add new subflows to a
   MPTCP connection.

   The modularity of path management will permit alternative mechanisms
   to be employed if appropriate in the future.

5.6.  Connection Identification

   Therefore, each MPTCP connection should have a connection identifier
   at each endpoint, which is locally unique within that endpoint.  In
   many ways, this is analogous to a port number in regular TCP.  The
   manifestation and purpose of such an identifier is out of the scope
   of this architecture document.

   Legacy applications will not, however, have access to this identifier
   and in such cases a MPTCP connection will be identified by the
   5-tuple of the first TCP subflow.  It is out of the scope of this
   document, however, to define the behaviour of the MPTCP
   implementation if the first TCP subflow later fails.  If there are
   legacy applications that make assumptions about continued existance
   of the initial address pair, their behaviour could be disrupted by
   carrying on regardless.  It is expected that this is a very small,
   possibly negligible, set of applications, however.  In the case of
   applications that have specifically asked to be bound to a particular
   address or interface, MPTCP will not be used.

   Since the requirements of applications are not clear at this stage,
   however, it is as yet unconfirmed what the best behaviour is.  It
   will be an implementation-specific solution, however, and as such the
   behaviour is expected to be chosen by implementors once more research
   has been undertaken to determine its impact.




Ford, et al.            Expires September 1, 2010              [Page 15]


Internet-Draft             MPTCP Architecture              February 2010


5.7.  Network Layer Compatibility

   MPTCP's modifications remain at the transport layer, although some
   knowledge of the underlying network layer is required.  MPTCP MUST
   work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection
   may operate over both IPv4 and IPv6 networks.

5.8.  Congestion Control

   As already documented in network-layer compatibility requirements,
   the congestion control algorithms used by an MPTCP implementation
   must not harm other legacy users on shared bottlenecks.  To achieve
   this, the congestion control algorithms on use on each subflow must
   be coupled in some way - a proposal for this is given in [6].


6.  Summary

   This document has provided a summary of the components that have been
   identified to provide a Multipath TCP solution, and described the
   high-level design decisions that have been used as a basis of the
   MPTCP specification.

   The suite of drafts that specify a complete MPTCP implementation, on
   top of this architectural overview, are as follows:

   o  A specification of the MPTCP protocol [3], describing the on- and
      off-the-wire differences to regular TCP.

   o  A specification of a coupled congestion control algorithm [6],
      that can be applied to the above protocol while meeting the goals
      for such an algorithm as specified in this document.

   o  A document [7] that builds upon the application compatibility
      issues discussed in this document, explaining in more detail what
      if any changes an application may experience through the use of
      MPTCP.  This document also provides a proposed API through which
      an application can influence the behaviour of the MPTCP protocol,
      as specified in the above drafts.


7.  Security Considerations

   Please see [14] for a threat analysis of Multipath TCP.  The threats
   analysed in this companion document are addressed as appropriate in
   the protocol design [3].





Ford, et al.            Expires September 1, 2010              [Page 16]


Internet-Draft             MPTCP Architecture              February 2010


8.  Interactions with Applications

   Interactions with applications - incuding, but not limited to,
   performances changes that may be expected, semantic changes, and new
   features that may be requested of an API, are presented in [7].


9.  Interactions with Middleboxes

   TBD

   This section will contain a list of issues that may arise with NATs,
   firewalls, proxies, intrusion detection systems, etc.

   This will be an overview only, to the level of suggested high-level
   solutions as presented in this document (e.g. dual-level sequence
   space), but protocol-specific solutions to these issues will be given
   in the companion documents.

   Example points include:

   o  NATs: change addresses

   o  NATs/Firewalls: drop options; split, coalesce packets; change
      sequence numbering?

   o  Firewalls: block incoming connection attempts; block unknown TCP
      options

   o  Proxies: PEPs can terminate TCP sessions before an endpoint

   o  Intrusion Detection: require ways to correlate subflows

   o  ...


10.  Acknowledgements

   Alan Ford, Costin Raiciu and Sebastien Barre are supported by Trilogy
   (http://www.trilogy-project.org), a research project (ICT-216372)
   partially funded by the European Community under its Seventh
   Framework Program.  The views expressed here are those of the
   author(s) only.  The European Commission is not liable for any use
   that may be made of the information in this document.







Ford, et al.            Expires September 1, 2010              [Page 17]


Internet-Draft             MPTCP Architecture              February 2010


11.  Contributors

   The authors would like to acknowledge the contributions of Mark
   Handley and Bryan Ford to this document.


12.  IANA Considerations

   None.


13.  References

13.1.  Normative References

   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

13.2.  Informative References

   [2]   Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource
         Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52,
         October 2008,
         <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.

   [3]   Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for
         Multipath Operation with Multiple Addresses",
         draft-ford-mptcp-multiaddressed-02 (work in progress),
         October 2009.

   [4]   Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
         September 1981.

   [5]   Stewart, R., "Stream Control Transmission Protocol", RFC 4960,
         September 2007.

   [6]   Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
         Aware Congestion Control", draft-raiciu-mptcp-congestion-00
         (work in progress), October 2009.

   [7]   Scharf, M. and A. Ford, "MPTCP Application Interface
         Considerations", draft-scharf-mptcp-api-00 (work in progress),
         October 2009.

   [8]   Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues",
         RFC 3234, February 2002.

   [9]   Carpenter, B., "Internet Transparency", RFC 2775,



Ford, et al.            Expires September 1, 2010              [Page 18]


Internet-Draft             MPTCP Architecture              February 2010


         February 2000.

   [10]  Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam",
          ACM HotNets, October 2008.

   [11]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
         Translator (Traditional NAT)", RFC 3022, January 2001.

   [12]  Freed, N., "Behavior of and Requirements for Internet
         Firewalls", RFC 2979, October 2000.

   [13]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
         Shelby, "Performance Enhancing Proxies Intended to Mitigate
         Link-Related Degradations", RFC 3135, June 2001.

   [14]  Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
         TCP", draft-ietf-mptcp-threat-00 (work in progress),
         February 2010.

   [15]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
         Control", RFC 5681, September 2009.


Appendix A.  Implementation Architecture

   This section provides suggestions for an architecture to implement an
   extensible, modular multipath transport protocol.

A.1.  Functional Separation

   This section describes a generic view of the internal implementation
   of a Multipath TCP, through which the technical components specified
   in the companion documents can fit together.  It shows how an
   implementation could be built that permits extensibility between
   components without changing the external representation.

   We first show the functional decomposition of an MPTCP solution that
   is completely contained in the transport layer.  That solution is
   described in more details in [3].  Then we generalize the approach to
   allow good extensibility of that solution.

A.1.1.  Application to default MPTCP protocol

   Although, in the default approach, MPTCP is fully contained in the
   transport layer, it can still be divided into two main modules.  One
   manages the scheduling of packets as well as congestion control.  The
   other one manages the control of paths.  The interface between the
   two is dealt with thanks to a Path Index.  As shown in Figure 7, the



Ford, et al.            Expires September 1, 2010              [Page 19]


Internet-Draft             MPTCP Architecture              February 2010


   Path Manager announces to the MultiPath Scheduler what paths can be
   used trough path indices, and maintains the mapping between that
   value and the particular action that it must apply to use the path
   (an example of such a mapping is in Table 1).  In the case of the
   built-in Path Manager, the action is to replace an address/port pair
   with another one, in such a way that another path is used across the
   Internet to forward that packet.


            Control plane    <--     |     -->    Data plane
   +---------------------------------------------------------------+
   |                     Multipath Scheduler (MPS)                 |
   +---------------------------------------------------------------+
                ^                    |          |
                |                    |   [A1,B1,|pA1,pB1]
                |For conn_id         |          |
                |<A1,B1,pA1,pB1>     |   +-------------+
                |Paths 1->4 can be   |   | Data packet |<--Path idx:3
                |used.               |   +-------------+   attached
                |                    |          |          by MPS
                |                    |          V
   +--------------------------------------------\------------------+
   |                         Path Manager (PM)   \[A1,B1]->[A1,B2] |
   +--------------------------------------------------\------------+
      /                           \  |                 \
     /-----------------------------\ |   /"\    /"\    /"\   /"\
     | rewriting table:             ||   | |    | |    | |   | |
     | Subflow id  <-->  network_id ||   | |    | |    | |   | |
     |                              ||   | |    | |    | |   | |
     |    [see table below]         ||   | |    | |    | |   | |
     |                              ||   \./    \./    \./   \./
     +------------------------------+|  path1  path2  path3 path4


      Figure 7: Functional separation of MPTCP in the transport layer

   The MultiPath Scheduler only deals with abstract paths, represented
   by numbers.  It only sees one address pair throughout the
   communication, that we call the connection identifier.  However, the
   MultiPath Scheduler must be able to perform per-subflow congestion
   control, and thus to distinguish between the subflows.  This leads to
   define a subflow identifier, that consists of the usual transport
   identifier extended with the path index:
   <addr_src,psrc,addr_dst,pdst,path_index>.  The following options,
   described in [3], are managed by the MultiPath Scheduler.

   o  MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP.
      Note that the MPC option also holds a token, which is necessary



Ford, et al.            Expires September 1, 2010              [Page 20]


Internet-Draft             MPTCP Architecture              February 2010


      only if the built-in Path Manager is used.  In the next section we
      describe the generalized case, where the token can be ignored by
      the receiver if another path manager is used.

   o  DATA SEQUENCE NUMBER (DSN): Identify the position of a set of
      bytes in the meta-flow.

   o  DATA FIN (DFIN): Terminate a meta-flow.

   An implementation MUST use those options even if another Path Manager
   than the default one is implemented.

   The Path manager applies a particular technology to give the MPS the
   possibility to use several paths.  The built-in MPTCP Path Manager
   uses multiple IPv4 addresses as its mean to influence the forwarding
   of packets through the Internet.

   When the MPS starts a new connection, the PM chooses a token that
   will be used to identify the connection.  This is necessary to allow
   the PM applying the correct path index to incoming packets.  An
   example mapping table is given hereafter:

      +-----------------+---------------+---------+-----------------+
      |  connection id  |   subflow id  |  token  |    Network id   |
      +-----------------+---------------+---------+-----------------+
      | <A1,B1,pA1,pB1> | <conn_id,pi1> | token_1 | <A1,B1,pA1,pB1> |
      | <A1,B1,pA1,pB1> | <conn_id,pi2> | token_1 | <A2,B2,pA1,pB2> |
      | <A1,B1,pA1,pB1> | <conn_id,pi3> | token_1 | <A1,B2,pA1,pB2> |
      | <A1,B1,pA1,pB1> | <conn_id,pi4> | token_1 | <A2,B1,pA1,pB1> |
      | <A1,B1,pA1,pB3> | <conn_id,pi1> | token_2 | <A1,B1,pA1,pB3> |
      | <A1,B1,pA1,pB3> | <conn_id,pi2> | token_2 | <A2,B1,pA1,pB3> |
      +-----------------+---------------+---------+-----------------+

              Table 1: Example mapping table for built-in PM

   Table 1 shows an example where two connections are ongoing.  One is
   identified by token_1, the other one with token_2.  Since addresses
   are rewritten by the path manager, the attachment to the right
   connection is achieved thanks to the token, which is used at
   connection establishment and subflow establishment.  It is then
   remembered.  The first column holds the information that is exposed
   to the applications, while the last column shows the information that
   is actually written in packets that will fly through the network.  We
   note that additionnally to the addresses, ports can be rewritten,
   which contributes to supporting NATs.  The table also shows the role
   of the token, which is to attach various combinations of ports and
   addresses to a single connection.  The token is specific to the
   built-in path manager, and can be ignored if another path manager is



Ford, et al.            Expires September 1, 2010              [Page 21]


Internet-Draft             MPTCP Architecture              February 2010


   used.  An implementation of the built-in path manager MUST implement
   the following options (defined in more details in [3]):

   o  Add Address (ADDR): Announce a new address we own

   o  Remove Addresse (REMADDR): Withdraw a previously announced address

   o  Join Connection (JOIN): Attach a new subflow to the current
      connection

   Those options form the default MPTCP Path Manager, based on declaring
   IP addresses, and carries control information in TCP options.  An
   implementation of Multipath TCP can use any Path Manager, but it MUST
   be able to fallback to the default PM in case the other end does not
   support the custom PM.  Alternative Path Managers may be specified in
   separate documents in the future.

A.1.2.  Generic architecture for MPTCP

   Now that the functional decomposition has been shown for MPTCP with
   the built-in Path Manager, we show how that architecture can be
   generalized to allow the implementation of other Path Managers for
   MPTCP.  A general overview of the architecture is provided in
   Figure 8.  The Multipath Scheduler (MPS) learns about the number of
   available paths through notifications received from the Path Manager
   (PM).  From the point of view of the Multipath Scheduler, a path is
   just a number, called a Path Index.  Notifications from the PM to the
   MPS MAY contain supporting information about the paths, if relevant,
   so that the MPS can make more intelligent decisions about where to
   route traffic.  When the Multipath Scheduler initiates a
   communication to a new host, it can only send the packets to the
   default path.  But since the Path manager is layered below the MPS,
   it can detect that a new communication is happening, and tell the MPS
   about the other paths it knows about.

















Ford, et al.            Expires September 1, 2010              [Page 22]


Internet-Draft             MPTCP Architecture              February 2010


            Control plane    <--     |     -->    Data plane
   +---------------------------------------------------------------+
   |                     Multipath Scheduler (MPS)                 |
   +---------------------------------------------------------------+
                ^                    |          |
                |                    |   [A1,B1,|pA1,pB1]
                |                    |          |
                |Announcing new      |   +-------------+
                |paths. (referred    |   | Data packet |<--Path idx:3
                |to as path indices) |   +-------------+   attached
                |                    |          |          by MPS
                |                    |          V
   +--------------------------------------------\------------------+
   |                         Path Manager (PM)   \__________zzzzz  |
   +--------------------------------------------------------\------+
      /                         \    |                       \
     /---------------------------\   |   /"\       /"\       /"\
     | subflow_id        Action  |   |   | |       | |       | |
     |<A1,B1,pA1,pB1,1>  xxxxx   |   |   | |       | |       | |
     |<A1,B1,pA1,pB1,2>  yyyyy   |   |   \./       \./       \./
     |<A1,B1,pA1,pB1,3>  zzzzz   |   |  path1     path2     path3
     +---------------------------+

                 Figure 8: Overview of MPTCP architecture

   From then on, it is possible for the MPS to associate a Path Index
   with its packets, so that the Path Manager can map this Path Index to
   a particular action (see table in the lower left part of Figure 8).
   The particular action depends on the network mechanism used to select
   a path.  Examples are address rewriting, tunnelling or setting a path
   selector value inside the packet.  Note that the Path Index is not
   supposed to be written inside the packet, but instead associated with
   it, internally to the implementation.

   The applicability of the architecture is not limited to the MPTCP
   protocol.  While we define in this document an MPTCP MPS (MPTCP
   Multipath Scheduler), other Multipath Schedulers can be defined.  For
   example, if an appropriate socket interface is designed, applications
   could behave as a Multipath Scheduler and decide where to send any
   particular data.  In this document we concentrate on the MPTCP case,
   however.

A.2.  PM/MPS interface

   The minimal set of requirement for a Path Manager is as follows:

   o  Outgoing untagged packets: Any outgoing packet flowing through the
      Path Manager is either tagged or untagged (by the MPS) with a path



Ford, et al.            Expires September 1, 2010              [Page 23]


Internet-Draft             MPTCP Architecture              February 2010


      index.  If it is untagged, the packet is sent normally to the
      Internet, as if no multi-path support were present.  Untagged
      packets can be used to trigger a path discovery procedure, that
      is, a Path Manager can listen to untagged packets and decide at
      some time to find if any other path than the default one is
      useable for the corresponding host pair.  Note that any other
      criteria could be used to decide when to start discovering
      available paths.  Note also that MPS scheduling will not be
      possible until the Path Manager has notified the available paths.
      The PM is thus the first entity coming into action.

   o  Outgoing tagged packets: The Path Manager maintains a table
      mapping path indices to actions.  The action is the operation that
      allows using a particular path.  Examples of possible actions are
      route selection, interface selection or packet transformation.
      When the PM sees a packet tagged with a path index, it looks up
      its table to find the appropriate action for that packet.  The tag
      is purely local.  It is removed before the packet is transmitted.

   o  Incoming packets: A Path Manager MUST ensure that each incoming
      path is mapped unambiguously to exactly one outgoing path.  Note
      that this requirement implies that the same number of incoming/
      outgoing paths must be established.  Moreover, a PM MUST tag any
      incoming path with the same Path Index as the one used for the
      corresponding outgoing path.  This is necessary for MPTCP to know
      what outgoing path is acknowledged by an incoming packet.

   o  Module interface: A PM MUST be able to notify the MPS about the
      number of available paths.  Such notifications MUST contain the
      path indices that are legal for use by the MPS.  In case the PM
      decides to stop providing service for one path, it MUST notify the
      MPS about path removal.  Additionnaly, a PM MAY provide
      complementary path information when available, such as link
      quality or preference level.


Authors' Addresses

   Alan Ford (editor)
   Roke Manor Research
   Old Salisbury Lane
   Romsey, Hampshire  SO51 0ZN
   UK

   Phone: +44 1794 833 465
   Email: alan.ford@roke.co.uk





Ford, et al.            Expires September 1, 2010              [Page 24]


Internet-Draft             MPTCP Architecture              February 2010


   Costin Raiciu
   University College London
   Gower Street
   London  WC1E 6BT
   UK

   Email: c.raiciu@cs.ucl.ac.uk


   Sebastien Barre
   Universite catholique de Louvain
   Pl. Ste Barbe, 2
   Louvain-la-Neuve  1348
   Belgium

   Phone: +32 10 47 91 03
   Email: sebastien.barre@uclouvain.be


   Janardhan Iyengar
   Franklin and Marshall College
   Mathematics and Computer Science
   PO Box 3003
   Lancaster, PA  17604-3003
   USA

   Phone: 717-358-4774
   Email: jiyengar@fandm.edu























Ford, et al.            Expires September 1, 2010              [Page 25]

Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 6182. Expired & archived
	Select version	00 01 02 03 04 05 RFC 6182
	Compare versions
	Author
	Replaces	draft-ford-mptcp-architecture
	RFC stream
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion