Network Working Group                                       P. Sarolahti
Internet-Draft                                                 HIIT/ICSI
Intended status: Experimental                              March 4, 2010
Expires: September 5, 2010

               Multi-address Interface in the Socket API


   This document specifies a new address family to be used for sockets
   that are bound to more than one IP address, as motivated by the
   Multipath TCP work in the IETF.  The goal is to use the same set of
   function calls as traditionally, but by new address family make it
   possible for them to express multiple addresses to connect or bind
   to.  The document gives a high-level definition of the behavior of
   the traditional function calls, but a detailed specification of the
   API syntax is not in the scope of this document.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on September 5, 2010.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Sarolahti               Expires September 5, 2010               [Page 1]

Internet-Draft                AF-Multipath                    March 2010

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  The Multipath Address Family . . . . . . . . . . . . . . . . .  4
   4.  Behaviour with Different Networking Functions  . . . . . . . .  6
     4.1.  Bind . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.2.  Connect  . . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.3.  Name resolution  . . . . . . . . . . . . . . . . . . . . .  7
     4.4.  Get Local Address / Get Remote Address . . . . . . . . . .  8
   5.  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .  8
     5.1.  Address Family or Protocol Family? . . . . . . . . . . . .  8
     5.2.  Comparison to Alternative Design Options . . . . . . . . .  9
     5.3.  Open Issues  . . . . . . . . . . . . . . . . . . . . . . . 10
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 11
   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 11
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 11
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 11
   Appendix A.  Change log  . . . . . . . . . . . . . . . . . . . . . 12
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13

Sarolahti               Expires September 5, 2010               [Page 2]

Internet-Draft                AF-Multipath                    March 2010

1.  Introduction

   The socket API is designed as a generic protocol-independent
   interface that includes a compact set of operations for network
   applications to communicate to the network.  Despite the small number
   of basic operations, the socket API has proved to be enough powerful
   to carry out various different kinds of tasks, arguably because of
   the generic enough interface definitions.  In addition to Internet
   communication, the socket API is also used, for example, in the Unix
   domain sockets that operate on Unix filenames as communication
   identifiers, and various control tasks between the applications and
   the communication stack, for example in IPsec key management sockets
   [RFC2367], or so called netlink or routing sockets that interact with
   the routing table in the network stack.  The exact semantics of the
   socket API methods are not defined by the API itself, but depend on
   the parameters given when a new socket is created.

   The socket API is designed to expose the addresses used for the
   communication to the applications.  Consistent with the generic
   design of the API, the addresses are passed in a generic type-length
   encoded structure, that is interpreted based on the address family
   given in the beginning of the structure.  In practice, primarily two
   address families are currently used for Internet communication: IPv4
   (also known as AF_INET) or IPv6 (AF_INET6).  Perhaps confusingly,
   traditionally the same constants used to indicate the address family
   are often also used to indicate the protocol family for creating
   socket.  This dependency is discussed in more detail in Section 5.1.

   This document proposes a new way to use the socket API to better
   support protocols that use multiple IP addresses at either end of a
   connection.  The primary motivation for this specification is the
   ongoing work on Multipath TCP (e.g., [I-D.ietf-mptcp-architecture],
   []), but the same API can be used with
   any other protocol that runs multiple addresses on a single socket.
   One of the design goals in this specification is to enable support
   for multiple addresses in a socket without changing the binary
   function call API at the operation system interface, or the set of
   networking function calls available in the system.  This design also
   aims to maintain unchanged semantics with the previously familiar
   operations to the extent it is possible, while avoiding backwards
   compatibility issues by explicitly using a new address family.

   Using Multipath TCP with a traditional single-homed socket API can be
   problematic, as discussed in the API considerations document
   [I-D.scharf-mptcp-api].  The socket API was designed with an
   assumption that a socket is using just one address, with this address
   being explicitly visible to the applications.  When the API is used
   with a protocol that uses multiple addresses for communication,

Sarolahti               Expires September 5, 2010               [Page 3]

Internet-Draft                AF-Multipath                    March 2010

   defining the semantics of existing function calls that directly refer
   to one IP address becomes problematic, possibly making the existing
   applications behave defectively when using the legacy socket API with
   Multipath TCP.  While the motivation of Multipath TCP to operate on
   unmodified legacy APIs is well understandable, eventually a more
   expressive API is needed to better manage connections using multiple
   addresses at either end.

   This document specifies a new multipath-compatible address family to
   be used with the familiar socket operations, called AF_MULTIPATH.
   This address family is composed as a sequence of one or multiple
   elements that are each structured in the same way as one of the
   existing address families supported by the system, such as AF_INET or
   AF_INET6.  At the same time, this lets the application indicate if it
   supports the use of multiple addresses for the socket, for example
   using multipath TCP.  One advantage of the Multipath Address Family
   is that it supports using different address families, such as IPv4 or
   IPv6, in the same address set, thereby enabling dual-stack
   functionality between both IPv4 and IPv6 interfaces (although we note
   that IPv4 addresses can be expressed as a AF_INET6 structure).

   The AF_MULTIPATH address family could be used also with other
   protocols capable of multihoming, for example SCTP [RFC4960].  It may
   possibly be applicable also to shim-layer approaches to multihoming
   such as SHIM6 [RFC5533] or HIP [RFC5206], although these are based on
   a different philosophy of splitting locators (IP addresses) from the
   host identity.  Different API extensions for multihomed protocols
   have been specified (or are being worked on), for example one using a
   set of socket options [I-D.ietf-shim6-multihome-shim-api], and
   another extending the set of socket operations in the socket API
   [I-D.ietf-tsvwg-sctpsocket].  This document deliberately proposes a
   new approach as an alternative to these, and discusses the benefits
   and disadvantages of different approaches in Section 5.2

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in [RFC2119].

3.  The Multipath Address Family

   Multipath address family (AF_MULTIPATH) is composed of a sequence of
   addresses, each expressed using one of the existing address family
   formats supported in the system.  A desirable behavior would be that
   in a system that supports the multipath address family, opening a

Sarolahti               Expires September 5, 2010               [Page 4]

Internet-Draft                AF-Multipath                    March 2010

   socket using one of the traditional single-address families should be
   taken as an indication that multiple addresses should not be used for
   that socket.  However, to allow migration period for legacy
   applications that are not converted to use the new address family,
   but would benefit from multipath communication, an additional option
   switch may be needed to control the behavior on traditional single-
   address families.

   The address family is structured according to the generic sockaddress
   structure as follows.  The fields are given in network byte order.

   o  Length (8 bits)

   o  Address family (=AF_MULTIPATH) (8 bits)

   o  Number of addresses (8 bits)

   o  Address 1

   o  Address 2

   o  ....

   o  Address N

   Each of the address records above takes the generic sockaddress
   format, i.e.:

   o  Length (8 bits)

   o  Address family (8 bits)

   o  Address (...i.e., the rest of sockaddr structure as defined by the
      address family...)

   In a dynamic multipath connection the set of address fields can
   change over time: new addresses may be added and earlier addresses
   can be removed.  This characteristic, that the socket address content
   can change during a connection, differs from traditional behavior.
   However, this just reflects the changed behavior of multipath TCP
   compared to traditional TCP, which uses the same pair of addresses
   through the connection life cycle.

   In today's systems the address family is commonly either AF_INET or
   AF_INET6, although also other address families can exist.  Depending
   on the address family, the address would typically be structured as
   sockaddr_in or sockaddr_in6 structure, with the length set
   appropriately.  Different address families can be combined in a

Sarolahti               Expires September 5, 2010               [Page 5]

Internet-Draft                AF-Multipath                    March 2010

   single AF_MULTIPATH record.

4.  Behaviour with Different Networking Functions

   This section defines the intended behavior of commonly used network
   operations when used with AF_MULTIPATH address family.  The section
   gives a high-level definition of the operations to be applied as
   appropriate in different application environments.

4.1.  Bind

   An AF_MULTIPATH socket can bind to several addresses using a single
   call.  It is possible to use wildcard ("Any") address in some of
   entries of the address set.  Technically, multiple "Any" addresses
   could allow binding several ports to the same socket, although it is
   unclear if there is any viable reason for doing so.  AF_MULTIPATH can
   also contain just one address entry, in which case the behavior is
   similar to traditional single-homed bind.  On return, the function
   call should indicate how many addresses were successfully bound, and
   use failure response to indicate that binding failed to all
   addresses.  "Get Local Address" operation (getsockname in Posix) can
   be used to investigate which addresses were successfully bound.

   Differing from its past use, bind can be called multiple times for
   the same socket, to allow the application dynamically change the set
   of local addresses.  When a subsequent bind call does not include an
   address that is currently in use, it indicates to the protocol that
   this address should not be used anymore in a connection.  When a
   subsequent bind call includes addresses that are not currently in
   use, it indicates that these addresses should be added to the
   connection.  The protocol implementation may change the set of used
   addresses dynamically without a trigger from application.  Before the
   bind call the set of currently used addresses can be obtained using
   the "Get Local Address" (getsockname) call as described in
   Section 4.4.

4.2.  Connect

   An AF_MULTIPATH socket can give multiple addresses to connect,
   assuming the addresses belong to the same host.  The underlying
   protocol may need to activate these connections one at a time, if the
   protocol logic does not permit connecting to multiple addresses
   simultaneously.  On return, the function call should indicate how
   many of the addresses were successfully connected, or an error code.
   It is expected that commonly this call is used together with name
   resolution, as described below.  "Get Remote Address" (getpeername)
   operation can be used to investigate which addresses were

Sarolahti               Expires September 5, 2010               [Page 6]

Internet-Draft                AF-Multipath                    March 2010

   successfully connected to.

   Similarly to bind operation, connect can be called multiple times for
   the same socket, to allow the application dynamically change the set
   of remote addresses.  When a subsequent connect call does not include
   an address that is currently in use, it indicates to the protocol
   that this address should not be used anymore in a connection.  When a
   subsequent bind call includes addresses that are not currently in
   use, it indicates that these addresses should be added to the
   connection (in practice triggering this from application can be
   unusual for remote addresses).  The protocol implementation may
   change the set of used addresses dynamically without a trigger from
   application.  Before the connect call the set of currently used
   addresses can be obtained using the "Get Remote Address"
   (getpeername) call as described in Section 4.4.

   An application may give multiple addresses that seem reachable, but
   belong to different hosts.  The underlying protocol that supports
   AF_MULTIPATH API MUST be able to detect such situation, and prevent
   connections to multiple hosts.  Often there are sufficient protocol
   mechanisms (such as connection tokens in multipath TCP) and other
   protocol state that cause such connections to fail automatically.

4.3.  Name resolution

   In a typical usage of a name resolver, multiple addresses may be
   returned from a name server, and a client cycles through the given
   addresses until connection is successfully established.  This is
   useful, for example, in dual-stack IPv4/IPv6 hosts.  A client may
   need to try connecting separately to IPv6 addresses and IPv4
   addresses, when it is not certain whether IPv6 is supported on the

   When an AF_MULTIPATH-enabled name resolver is called, it returns the
   available address records as separate entries in a single
   AF_MULTIPATH structure.  This would mean that the call returns a
   single AF_MULTIPATH host entry that may contain multiple addresses as
   specified in the AF_MULTIPATH format.  An application may directly
   place the returned AF_MULTIPATH structure as a parameter of a connect
   call, indicating that a multipath protocol should try these addresses
   as subflows of the multipath connection.  When a name resolver
   receives an AF_MULTIPATH-enabled call in a host that supports both
   IPv6 and IPv4, it is useful to invoke name server queries for both
   IPv6 and IPv4 addresses.  If available, records of both types are
   returned to the application, that can pass them to connect call, in
   attempt to invoke a multipath connection over both IPv4 and IPv6
   paths.  This may be useful feature supporting transition from IPv4 to

Sarolahti               Expires September 5, 2010               [Page 7]

Internet-Draft                AF-Multipath                    March 2010

   It is not uncommon that a DNS name is associated to multiple hosts,
   for example to perform DNS-based load balancing.  As discussed above
   with the connect call, the underlying protocol implementation should
   be able to prevent connect attempts to separate hosts.

4.4.  Get Local Address / Get Remote Address

   The basic operation of these calls happens as before: they return a
   sockaddress structure either at the local or the remote end.
   AF_MULTIPATH address family is returned if it has been used earlier
   with the same socket.  The set of local or remote addresses SHOULD be
   up-to-date with the currently active set in the protocol
   implementation.  When the underlying protocol is able to change the
   active address set during the connection, this implies that
   subsequent calls to these functions can return different sets of
   addresses.  Along with the current addresses, an application learns
   also about the connection identifier with this call.  The connection
   identifier returned with "Get Remote Address" MUST be the same as the
   identifier returned with "Get Local Address", since it only has local
   meaning, and therefore the other end of the connection typically gets
   entirely different connection identifier for the connection.

5.  Discussion

5.1.  Address Family or Protocol Family?

   The socket API defines separately the protocol family, that is used
   to define the semantics of the socket behavior when a socket is
   created, and the address family that is used to distinguish different
   socket address formats [RFC2553].  By convention, in many systems
   AF_INET is defined to have an equal value to PF_INET, and AF_INET6 is
   defined to have an equal value to PF_INET6, allowing the pairs of
   definitions to be used interchangeably in implementations.

   Defining separate protocol families for IPv6 and IPv4 has some
   unfortunate consequences: in principle, it needs to be decided at
   socket creation time whether to use IPv6 or IPv4.  In practice, IPv6
   defines a specific address range, IPv4-mapped addresses, to
   "virtually" represent IPv4 addresses in IPv6 address space.
   Implementations can use this feature to signal, that IPv4 should be
   used on the wire.  An alternative design choice could have been to
   use same protocol family for both IPv4 and IPv6, and distinguish use
   of IPv4 and IPv6 with the address family, that can be selected also
   after the socket is created.  After all, the transport protocol
   implementation, and its semantics, are the same in both cases.

   In case of AF_MULTIPATH, the format of the socket address structure

Sarolahti               Expires September 5, 2010               [Page 8]

Internet-Draft                AF-Multipath                    March 2010

   changes, while for example in the case of multipath TCP, the
   semantics of the socket calls are intended to remain unchanged.  This
   would suggest that when AF_MULTIPATH address family is used with
   socket address structures, the protocol family definition should
   remain at its typical value.  Here an argument can be made that
   AF_MULTIPATH changes the behavior of bind and connect calls, but on
   the other hand, it is correct to say that from the applications point
   of view the outcome of a successful completion -- choosing a local
   access point, or successfully establishing connection with the peer
   -- is exactly the same as with the traditional use of the API.  It is
   unclear at this point, whether the pairwise definitions of protocol
   family and address family has lead to dependencies in implementations
   that would hinder the implementation of the AF_MULTIPATH address
   family, or whether such dependencies would be difficult to be fixed.

5.2.  Comparison to Alternative Design Options

   Replacing the current socket API with a "connect-by-name" API has
   been proposed.  Different proposals have slightly different
   abstraction levels, but commonly in these APIs application passes a
   DNS name with the connect call.  The benefit of such API is that the
   application does not need to handle the network addresses, that
   arguably shouldn't be application's concern in most cases, and it
   thus avoids a separate name resolution step.  In a long term this
   seems a promising direction to take in API design, but involves
   inter-operation between the name resolver that is often implemented
   in user space in current systems, and may need changes in operating
   system kernel interface.  This proposal intentionally has taken the
   traditional, more short term approach, to expose the network
   addresses to applications.

   Additions to the set of calls in the socket API has been proposed,
   for example, alternative operations for multi-address bind and
   connect [I-D.ietf-tsvwg-sctpsocket].  In the beginning this document
   lauded the elegance of simple, generic socket API with a small basic
   set of operations, and addition of new purpose-specific operations
   would be a deviation from this design principle.  In addition adding
   operations to socket API would cause changes to the operating system
   kernel function interface, that could cause interoperability and
   maintenance issues.  One way to implement such extended operations
   would be through an user-space library that maps the operations to
   the existing socket calls in the kernel implementation.  In mapping
   the additional operations to existing kernel interface, such library
   might leverage a mechanism similar to what presented in this
   document, or use, for example socket options or ioctl calls.

   Socket options can be used tune parameters affecting the protocol
   behavior.  The extensibility of the socket option format can make it

Sarolahti               Expires September 5, 2010               [Page 9]

Internet-Draft                AF-Multipath                    March 2010

   appealing to use this interface for more significant run-time tasks,
   such as adding or deleting addresses in a multi-address session, as
   done in [I-D.ietf-shim6-multihome-shim-api].  In this approach the
   semantics of the traditional single-homed operations still need to be
   specified.  It can also be questioned whether it is appropriate to
   use socket options to trigger actions that can be seen to fall beyond
   the scope of the original meaning of "socket option".

   In summary, introducing a new address family as proposed in this
   document allows keeping the existing set of socket operations in the
   API, which the author believes to be a useful property, for example
   concerning the maintenance of the interfaces between the operation
   system and applications.  If an operating system does not support
   AF_MULTIPATH, it can gracefully return an appropriate error code to
   an application, that can then revert to use the traditional single-
   homed address family, if desired.  There are no backwards
   compatibility issues with old applications, because applications
   explicitly signal their support of this address family with the
   connect or bind calls.

   Presenting the multiple addresses associated with the connection
   using the socket addresses seems a natural and native way of
   expressing this fairly new kind of property to applications.  The
   recent discussion on this idea showed, however, that the use of more
   dynamic socket addresses can be perceived as unconventional and can
   raise doubts, for example regarding the possible assumptions about
   the persistence of the address structure in the implementations.  It
   is unclear how actual this concern is, given that the underlying
   protocol that dynamically maintains the set of addresses is a fairly
   new feature, compared to decades of past use of the end-to-end

5.3.  Open Issues

   Below are listed some potentially open issues that need to be taken
   in to account in follow-up discussion on this document.

   o  Are there any constraints on the generic format of the socket
      address structure, that would conflict with what proposed above?
      It appears that for example BSD and Linux use different formats
      for this structure, so that the Linux structure follows the "old
      BSD" convention, without a common length field (sa_len).  Are
      there reasons to follow one of these conventions, or would it be
      possible, for example, to specify a 16-bit length field that could
      be useful for large sets of IPv6 addresses?

   o  Is there a need to expose pairwise source-destination subflow
      associations, instead of just a group of source and destination

Sarolahti               Expires September 5, 2010              [Page 10]

Internet-Draft                AF-Multipath                    March 2010

      addresses?  Currently no such reason can be seen: the socket
      should represent one logical connection between a source host and
      a destination host, that in this case may have multiple IP
      addresses in use for the connection.

   o  Is a separate connection identifier field needed, as in the
      earlier version of this document?  Currently the author cannot
      identify such need.

6.  Security Considerations

   No additional security threats are known because of the multipath
   address family.  This document referred to the possibility that
   dynamic end host multihoming may enable new ways to maliciously
   transfer a connection to another host.  A multi-address interface may
   open this possibility to applications, but ultimately the multihomed
   protocol should have mechanisms to protect from such behavior.

7.  Acknowledgments

   The author is thankful to Alan Ford for pointing out specific
   technical issues to be addressed, and to the people who have
   participated the discussion on the MPTCP mailing list.

   The author is supported by Finland-ICSI Center for Novel Internet
   Architectures (FICNIA) and the Finnish Funding Agency for Technology
   and Innovation (TEKES).

8.  References

8.1.  Normative References

              Ford, A., Raiciu, C., Barre, S., and J. Iyengar,
              "Architectural Guidelines for Multipath TCP Development",
              draft-ietf-mptcp-architecture-00 (work in progress),
              March 2010.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

8.2.  Informative References

              Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for

Sarolahti               Expires September 5, 2010              [Page 11]

Internet-Draft                AF-Multipath                    March 2010

              Multipath Operation with Multiple Addresses",
              draft-ford-mptcp-multiaddressed-02 (work in progress),
              October 2009.

              Komu, M., Bagnulo, M., Slavov, K., and S. Sugimoto,
              "Socket Application Program Interface (API) for
              Multihoming Shim", draft-ietf-shim6-multihome-shim-api-13
              (work in progress), February 2010.

              Stewart, R., Poon, K., Tuexen, M., Yasevich, V., and P.
              Lei, "Sockets API Extensions for Stream Control
              Transmission Protocol (SCTP)",
              draft-ietf-tsvwg-sctpsocket-21 (work in progress),
              February 2010.

              Scharf, M. and A. Ford, "MPTCP Application Interface
              Considerations", draft-scharf-mptcp-api-00 (work in
              progress), October 2009.

   [RFC2367]  McDonald, D., Metz, C., and B. Phan, "PF_KEY Key
              Management API, Version 2", RFC 2367, July 1998.

   [RFC2553]  Gilligan, R., Thomson, S., Bound, J., and W. Stevens,
              "Basic Socket Interface Extensions for IPv6", RFC 2553,
              March 1999.

   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
              RFC 4960, September 2007.

   [RFC5206]  Nikander, P., Henderson, T., Vogt, C., and J. Arkko, "End-
              Host Mobility and Multihoming with the Host Identity
              Protocol", RFC 5206, April 2008.

   [RFC5533]  Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming
              Shim Protocol for IPv6", RFC 5533, June 2009.

Appendix A.  Change log

   Changes from version -00 to -01

   o  Added more background to the Introduction

   o  Added section to discuss protocol family/address family issues,
      and comparison to related API designs

Sarolahti               Expires September 5, 2010              [Page 12]

Internet-Draft                AF-Multipath                    March 2010

   o  Removed the connection identifier from the address structure, and
      changed related descriptions related to socket calls.

   o  Added discussion about open issues in Section 3

Author's Address

   Pasi Sarolahti
   1947 Center Street (Suite 600)
   Berkeley, CA  94704

   Phone: +1 (510) 409 - 9972

Sarolahti               Expires September 5, 2010              [Page 13]