Internet Engineering Task Force                               J. Ubillos
Internet-Draft                             Swedish Institute of Computer
Intended status: Experimental                                    Science
Expires: March 13, 2011                                            M. Xu
                                                                 Z. Ming
                                                     Tsinghua University
                                                                 C. Vogt
                                                                Ericsson
                                                       September 9, 2010


                           Name Based Sockets
                  draft-ubillos-name-based-sockets-02

Abstract

   This memo defines the name based sockets.  Name based sockets allow
   the application developer to refer to remote hosts (and it self) by
   name only, passing on all IP (locator) management to the operating
   system.  Applications are thus relieved of re-implementing features
   such as multi-homing, mobility, NAT traversal, IPv6/IPv4
   interoperability and address management in general.  The operating
   system can in turn re-use the same solutions for a whole set of guest
   applications.  Name based sockets aims to provide a whole set of
   features without adding new indirection layers, new delays or other
   dependences while maintaining transparent backwards compatibility.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 13, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.



Ubillos, et al.          Expires March 13, 2011                 [Page 1]


Internet-Draft             Name Based Sockets             September 2010


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Conventions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
     3.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . .  3
     3.2.  Motivation . . . . . . . . . . . . . . . . . . . . . . . .  3
     3.3.  Compatibility goals  . . . . . . . . . . . . . . . . . . .  4
       3.3.1.  Network compatibility  . . . . . . . . . . . . . . . .  4
       3.3.2.  Application compatibility  . . . . . . . . . . . . . .  4
     3.4.  Name based sockets overview  . . . . . . . . . . . . . . .  4
     3.5.  Protocol overview  . . . . . . . . . . . . . . . . . . . .  4
   4.  Initial name exchange  . . . . . . . . . . . . . . . . . . . .  5
     4.1.  Name format  . . . . . . . . . . . . . . . . . . . . . . .  6
   5.  API  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   6.  Mobility support . . . . . . . . . . . . . . . . . . . . . . .  8
     6.1.  Shim6  . . . . . . . . . . . . . . . . . . . . . . . . . .  8
       6.1.1.  Brief overview of changes  . . . . . . . . . . . . . .  9
       6.1.2.  Identity change  . . . . . . . . . . . . . . . . . . . 10
       6.1.3.  The hand-shake with name exchange  . . . . . . . . . . 10
       6.1.4.  Triggers of shim6  . . . . . . . . . . . . . . . . . . 10
       6.1.5.  Establishing Shim6 context . . . . . . . . . . . . . . 11
       6.1.6.  Problems for Shim6 to support mobility . . . . . . . . 11
       6.1.7.  Changes to REAP  . . . . . . . . . . . . . . . . . . . 13
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
   9.  Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
     10.2. Informative References . . . . . . . . . . . . . . . . . . 13
     10.3. URL References . . . . . . . . . . . . . . . . . . . . . . 14
   Appendix A.  Change Log  . . . . . . . . . . . . . . . . . . . . . 14
   Appendix B.  Open Issues . . . . . . . . . . . . . . . . . . . . . 14








Ubillos, et al.          Expires March 13, 2011                 [Page 2]


Internet-Draft             Name Based Sockets             September 2010


1.  Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.  Terminology

   Locator - An IP address (v4 or v6) on which a host can be reached.

   Multi-home - A host which is reachable through multiple locators (on
   one interface or more)

   Name - A character string (max 255 chars long) on which a host can be
   identified.  A name maps to zero or more locators.

3.  Overview

3.1.  Introduction

   Name based sockets provide a unified interface which caters to the
   application developers who wish to simply open up a communication to
   a remote host by its name and have the operating system perform the
   management of locators.

   It does so by providing a new address family (AF_NAME) which allows
   the developer to use a name instead of an IP.

3.2.  Motivation

   Network communication has for very long been based on the assumption
   that applications should deal with the IP (locator) management.  This
   is based on the legacy notion that an IP does not change during a
   session and that a session is a communication between two given IPs
   (locators).  This situation has changed, locators change during a
   session, and a device/host might have multiple locators, hosts may be
   behind NATs or be on different networks (IPv4/IPv6).  Today, this is
   mostly left to the individual applications to solve.  This adds
   complexity to the applications making it a challenge in it self just
   to meet the networking demands of applications.

   Name based sockets aims to fix this by pushing the locator management
   to the operating system, without introducing any new limitations or
   delays.

   Note that Name based sockets do not aim to replace all socket() based
   communications.  There are of course cases which are limited due to
   obvious boot-strapping problems.  E.g. a DHCP client, or an DNS-



Ubillos, et al.          Expires March 13, 2011                 [Page 3]


Internet-Draft             Name Based Sockets             September 2010


   querying client would do better in not using a name oriented
   architecture.

3.3.  Compatibility goals

3.3.1.  Network compatibility

3.3.2.  Application compatibility

3.4.  Name based sockets overview

   Name based sockets provide a new socket interface.  It is implemented
   using a new address family (AF_NAME).  This means that the sockets
   are only used by applications who explicitly invoke it.  The result
   is that applications that do use name based sockets are very aware of
   the set of features they are provided with, hence they know not to
   re-implement it.  Current implementations are implemented as kernel
   modules, however a user-space library implementation ought to work
   just as well.

   By using DNS as the ID/Locator mapping structure, we do not introduce
   any new indirections.  Please note that the responding host does not
   need to do any DNS resolution (explained below).  We part from the
   assumption that most application network calls start with a FQDN.

   The exchange of names is performed in-band, by piggy-backing the
   needed information on the first couple of packets exchanged.  The
   consequence is:

   o  No extra delays

   o  No extra features until the name exchange has ended.

   o  As a result of this, we also achieve backwards compatibility (a
      name exchange which never completes)

3.5.  Protocol overview

   Name based sockets work by performing a name exchange in the
   beginning of a communication.  It does this in an non-blocking way.
   In practice what happens is that the first few packets are exchanged
   in a normal legacy fashion.  However, to these packets, extra
   information about the corresponding hosts names are piggy-backed.  If
   a name exchange is successful, the extra features provided by name
   based sockets are enabled.  If the exchange does not succeed, normal
   legacy communication continues unaffected.

   The name of each host is either its FQDN, its IP in IPv6.arpa format



Ubillos, et al.          Expires March 13, 2011                 [Page 4]


Internet-Draft             Name Based Sockets             September 2010


   or an arbitrary nonce.  It may or may not be authenticated.  In the
   ordinary case, the name is not authenticated, thus the receiver does
   not need to perform a reverse or forward lookup, hence not adding any
   further delays to the first packet(s).  The motivation for this is to
   avoid any additional "first-packet" delays.

   Once the name exchange has been performed successfully the complete
   feature set will be made available to the communication
   automatically.

   The expected API for a socket using AF_NAME is the same as for e.g.
   TCP (SOCK_STREAM).  This is also the case for SOCK_DGRAM and similar
   protocols, for all practical purposes, the functionality remains
   unchanged, however, as a state is created in both ends, a connection
   oriented model is more intuitive.

4.  Initial name exchange

   When the sender sends the first packet to the receiver it appends its
   own name as an IP-Option/IPv6-Extension header.  It repeats this for
   a predefined amount of time or packets.  On the receiving end, if the
   receiver supports name based sockets it appends its own name in the
   same fashion for a predefined amount of time or packets.  Should the
   receiver not be able to interpret the name, the option/extension
   header is ignored and the legacy communication precedes as normal.

   This kind of name exchange has two consequences.  First and foremost
   that there are no extra delays on the initial packets.  Secondly that
   the complete feature set provided by name based sockets will not be
   available until a few packets have been exchanged.

   In figure 1 the exchange is depicted.  The first packet from the
   sender has its own name appended to it.  The next few packets sent
   will also have the name appended to it for a predefined number of
   packets (X in the figure) or until a reply including a name extension
   is received.  On the receiving end, should an incoming packet have a
   name extension, the receiver begins to append its own name to the
   sent packets.  It does so for a predefined number of packets (X in
   the figure).












Ubillos, et al.          Expires March 13, 2011                 [Page 5]


Internet-Draft             Name Based Sockets             September 2010


   .--------.                         .----------.
   | Sender |                         | Receiver |
   '--------'                         '----------'
       |                                   |
       |                                   |
       |       .-------------------.       |
       | ------|    Data packet    |-----> |  Packet received.
       |       |+ name piggybacked |       |  The first response
       |       '-------------------'       |  packet will have
       | ---------->                       |  the receiver name
       | ------->                          |  piggy backed to it.
       | ---->  .-----------------------.  |
       | ->     | name piggybacked      |  |
       |        | until reply or X pkts |  |
       |        '-----------------------'  |
       |                                   |
       |                                   |
       |                                   |
       |        .-------------------.      |
       | <------|    Data packet    |----- |
       |        |+ name piggybacked |      |
       |        '-------------------'      |
       |                       <---------- |
       |                          <------- |
       |       .------------------.  <---- |
       |       | name piggybacked |     <- |
       |       | until X pkts     |        |
       |       '------------------'        |
       |                                   |
       |                                   |

   Figure 1

4.1.  Name format

   Names can be provided in any of three ways.

   o  FQDN.  The Fully Qualified Domain Name of the host.  This will
      allow e.g.  DNSsec to provide authenticity of the name.

   o  ip6.arpa.  Using one of the hosts interfaces addresses as a name.

   o  Nonce.  A one-use only session identifier.








Ubillos, et al.          Expires March 13, 2011                 [Page 6]


Internet-Draft             Name Based Sockets             September 2010


5.  API

   The API follows the connection oriented model, e.g.  TCP.  The
   available calls are:

   listen():  Prep for incoming session

      fd = listen( local_name, peer_name, service, transport );

   open():  Initiate outgoing session

      fd = open( local_name, peer_name, service, transport );

   accept():  Receive incoming session

      accept( peer_name, fd );

   read():  Receive data

      data = read( fd );

   write():  Send data

      write( fd, data );

   close():  Close session

      close( fd );

   Note: In the above examples

   local_name  The local identifier.  Either an explicit name or a
      wildcard (*).  As host may have multiple names, to choose which to
      listen to, a name must be chosen or a wildcard may be used.  In
      the latter case, on listening, any destination name is accepted;
      on send, an arbitrary name (valid for the host) may automatically
      be chosen by the socket.

   peer_name  The remote identifier.  Either an explicit name or a
      wildcard (*).  In the accept() function, a wildcard might be
      inserted.  In this case, all incoming packets will be accepted,
      independent of sender name.

   service  The service to be run.  In general this will correspond to
      the keyword field in the IANA Port Numbers registry.  E.g. http,
      ftp and ssh.





Ubillos, et al.          Expires March 13, 2011                 [Page 7]


Internet-Draft             Name Based Sockets             September 2010


   transport  Transport refers to the chosen transport protocol.  This
      could be e.g.

      *  TCP (SOCK_STREAM)

      *  UDP (SOCK_DGRAM)

      *  SCTP (SOCK_SCTP)

      *  DCCP (SOCK_DCCP)

      *  NORM ...

      *  And so on ...

   Example code snippet

   // Header files omitted

   int main (int argc, const char *argv[]) {
       int fd;
       char rec_buf[100], snd_buf[100];
       struct sockaddr_name name_sock;

       fd = socket( AF_NAME, SOCK_STREAM, 0);
       name_sock.sname_family = AF_NAME;
       strcpy(name_sock.sname_addr.name, "my.host.name");
       name_sock->sname_port = htons(5000);

       bind(fd, (struct sockaddr *)&name_sock, sizeof(name_sock));
       connect(fd, (struct sockaddr *)&name_sock, sizeof(name_sock));

       write(fd, snd_buf, sizeof(snd_buf));
       read(fd, snd_buf, sizeof(snd_buf));

       close(fd);
   }


6.  Mobility support

6.1.  Shim6

   "...  The Shim6 protocol, a layer 3 shim for providing locator
   agility below the transport protocols, so that multihoming can be
   provided for IPv6 with failover and load-sharing properties, without
   assuming that a multihomed site will have a provider-independent IPv6
   address prefix announced in the global IPv6 routing table.  The hosts



Ubillos, et al.          Expires March 13, 2011                 [Page 8]


Internet-Draft             Name Based Sockets             September 2010


   in a site that has multiple provider- allocated IPv6 address prefixes
   will use the Shim6 protocol specified in this document to set up
   state with peer hosts so that the state can later be used to failover
   to a different locator pair, should the original one stop working. "
   RFC5533 [RFC5533]

6.1.1.  Brief overview of changes

   To the upper layers, shim6 provides a stable IP address-like
   identifier (ULID) to identify the remote host and make the IP
   addresses (locators) transparent to the application.  This way of
   providing a pseudo-address (ULID) does however invite confusion.  The
   ULID selected by Shim6 is actually an IP address which is available
   for the application when the connection is being established.  This
   address (ULID) may become invalid during the connection RFC5533
   Section 1.5 [RFC5533].  ULID invalidation is beyond the control of
   the individual hosts, it is controlled by the network.  This might
   cause confusion if the applications continues to use the ULIDs which
   are no longer valid.  Shim6s solution to this problem to terminate
   the communication immediately when ever any ULID becomes invalid.
   This is definitely inappropriate in a mobile scenario as connections
   are expected to be preserved during the mobile period.  Moving
   between two distinct networks, changing your complete locator set is
   the common scenario (e.g. entirely switching from one WiFi to
   another.)

   Name Based Sockets suggest using the name of a host as the
   identifier.  This solves the above problems, as a name is valid for
   as long as a host wishes it to be.  Also, as Name Based Sockets
   provide a new explicit interface (names rather than 'fake IP
   addresses'), applications that use it will be aware of the available
   features, and may make correct assessments of the underlying IP stack
   and its enhancements.

   This section describes a set of changes and improvements to shim6
   that are to be incorporated with the Name Based Sockets.

   Briefly, the changes are:

   o  Name is used as ULID rather than IP (xref target="IdentityChange"
      />).

   o  Node inter-reachability resilience, for when both nodes are
      simultaneously mobile (Section 6.1.6.1).







Ubillos, et al.          Expires March 13, 2011                 [Page 9]


Internet-Draft             Name Based Sockets             September 2010


6.1.2.  Identity change

   Shim6 selects a locator (IP) in the initial contact with the remote
   peer and uses this locator as an upper-layer identifier (ULID).  To
   support NBS, we use name (or some structure related to name) as ULID
   instead of IP addresses.

   Because the end point identifier is no longer a locator but a name,
   the initial name exchange is performed by NBS and Shim6 will further
   use this name to construct the ULID.

   Shim6 requires that any communication using a ULID MUST be terminated
   when the ULID becomes invalid.  Using names as ULIDs instead of IP
   address is more in line with the transport semantic.  Having names as
   ULIDs means that the session may still exist even if both
   communicating hosts' locator lists are empty at a given point of
   time.  This is particularly important when one or both peer(s) are
   moving.

   Note that replacing a ULID with a name does not mean representing the
   ULID as a string or a string-like structure.  In order to make least
   modification to both Shim6 protocol (where ULID is a 128-bit IPv6
   address) and its current implementation, we propose to represent the
   name as a 128-bit MD5-hash and use this MD5-hash as the corresponding
   ULID.

6.1.3.  The hand-shake with name exchange

   As is described in RFC5533 [RFC5533], Shim6 does not need to react
   immediately when connections start up.  The initial name exchange is
   performed by NBS and it requires no help of Shim6.  The name
   exchanged by NBS will be further used as ULID by Shim6.  At some time
   during the communication, some heuristic may determine that it is
   appropriate to use shim6 to support mobility/multi-homing, so the
   communicating hosts initiate a 4-way, context-establishment exchange.
   As a result, both hosts get a locator list of each other.

   As an extension to Shim6, we do not change the operation sequence of
   the 4-way exchange, namely the order of I1, R1, I2, R2 will not be
   changed.  What is changed is that the IP-based ULID is replaced by a
   name-based ULID and the hand shake no longer requires ULID
   negotiation because it has already been done by NBS.

6.1.4.  Triggers of shim6

   It is not necessarily worth paying the overhead of setting up a shim
   context when e.g. only a small number of packets are exchanged
   between two hosts.  As a result, Shim6 functionality will not be



Ubillos, et al.          Expires March 13, 2011                [Page 10]


Internet-Draft             Name Based Sockets             September 2010


   started immediately as a new communication is initiated.

   NBS uses some heuristic for determining when to perform a deferred
   context establishment.  This heuristic might be that more than 50
   packets have been sent or received, or that there was a timer
   expiration while active packet exchange was in place RFC 5533
   [RFC5533].  How the heuristic is designed is beyond the scope of this
   document.

6.1.5.  Establishing Shim6 context

   At a certain time during the connection, some heuristic on host A or
   B (or both) determine that it is appropriate to pay the Shim6
   overhead to improve host-to-host communication.  This makes the Shim6
   initiate the 4-way, context-establishment exchange RFC 5533
   [RFC5533].

   As a result, both A and B get a list of locators for each other.  In
   the case of name-based Shim6, ULID is represented as a MD5-hash of
   name rather than IP.

6.1.6.  Problems for Shim6 to support mobility

   When only one host moves to a new network, a REAP Update is triggered
   to prevent connection from being terminated.  Under normal
   circumstances, connection will be smoothly preserved during the REAP
   Update process.

   However, REAP itself is not sufficient to support full mobility
   functionality, as when both hosts move simultaneously, neither of
   them will receive the update message, which will lead to a connection
   loss.  To deal with this problem, DNS should be involved to provide
   address information.

6.1.6.1.  DNS querying

   An effective solution for the mobility problem is to have a
   "stationary infrastructure" to provide address information for all
   mobile devices.  We propose to use DNS as the stationary
   infrastructure as it associates addresses with names and has enough
   capability.  How DNS incorporates with name-based Shim6 is described
   in the following part.

6.1.6.2.  One peer moves

   In the case that only one host moves, the moving host starts a REAP
   Update process to re-establish Shim6 context with the correspondent
   host.  At the same time, DNS should be updated by the moving host.



Ubillos, et al.          Expires March 13, 2011                [Page 11]


Internet-Draft             Name Based Sockets             September 2010


   This procedure is the normal REAP [RFC5534] procedure with the added
   update to DNS.

   The following sequence illustrates the details:

   1.  Two hosts, namely A and B are communicating using NBS and Shim6.

   2.  At certain moment, A moves to a new network and changes its IP
       address (locator).

   3.  A updates the authoritative DNS with its new IP address.  In
       parallel, A starts the REAP Update process by sending B an Update
       Request and the REAP Update process is invoked.

   4.  New operational locator pair is found by REAP Update process.

   5.  Handover process is completed and connection is preserved during
       the process.

6.1.6.3.  Both peers move

   When both hosts moves simultaneously, neither host will receive the
   REAP Update Request, thus REAP will fail in finding the new
   operational locator pair.  Under such circumstances, both hosts need
   to query DNS for the correspondent hosts addresses.  When new address
   is retrieved, both hosts initiate REAP Update process as specified in
   RFC5534 [RFC5534].

   The following sequence illustrates the details:

   1.  Two hosts, namely A and B are communicating using NBS and Shim6.

   2.  At certain moment, both A and B move simultaneously and both
       hosts change their respective IP addresses.

   3.  Both hosts update DNS with their new addresses and send REAP
       Update Request to their correspondent peer.

   4.  Due to concurrent move, Update Requests are lost for both
       directions.

   5.  Both hosts experience an Update Timeout and query DNS for
       correspondent hosts' locators using their names.

   6.  New addresses are returned by the respective DNS queries and REAP
       Update is now able to operate.  A and B re-invoke a REAP Update
       process using the new addresses.




Ubillos, et al.          Expires March 13, 2011                [Page 12]


Internet-Draft             Name Based Sockets             September 2010


   7.  New operational locator pair is found by REAP Update process.

   8.  Handover process is completed and connection is preserved during
       the handover process.

6.1.7.  Changes to REAP

   We extend REAP by adding DNS Querying into its Path Exploration.  For
   the sake of backwards compatibility, DNS can be implemented as a
   separate module and has no impact on the other part of REAP which is
   specified in RFC 5534.  The DNS functionality can be turned off for
   stationary hosts and be turned on for mobile devices.

   In the case of mobile scenario, DNS Query and REAP Path Exploration
   may work together to provide stronger reliability.  DNS query might
   be lost due to link failure or timeout due to high network delay.
   Under such circumstances, REAP Path Exploration will be triggered
   because of a SEND TIMEOUT and tries to find an available path.  This
   is meaningful when a host has multiple available interfaces (for
   instance Wi-Fi and 3G) and the address change for one interface does
   not lead to the change for others.

7.  Security Considerations

8.  IANA Considerations

   Name based sockets requires a new address family (AF_NAME) to be
   defined.

9.  Contributors

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

10.2.  Informative References

   [RFC5533]  Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming
              Shim Protocol for IPv6", RFC 5533, June 2009.

   [RFC5534]  Arkko, J. and I. van Beijnum, "Failure Detection and
              Locator Pair Exploration Protocol for IPv6 Multihoming",
              RFC 5534, June 2009.





Ubillos, et al.          Expires March 13, 2011                [Page 13]


Internet-Draft             Name Based Sockets             September 2010


10.3.  URL References

Appendix A.  Change Log

   Note to RFC Editor: if this document does not obsolete an existing
   RFC, please remove this appendix before publication as an RFC.

Appendix B.  Open Issues

   Note to RFC Editor: please remove this appendix before publication as
   an RFC.

Authors' Addresses

   Javier Ubillos
   Swedish Institute of Computer Science
   Kistagangen 16
   Kista
   Sweden

   Phone: +46767647588
   EMail: jav@sics.se


   Mingwei Xu
   Tsinghua University
   FIT Building  4-104, Tsinghua  University
   Beijing
   China

   EMail: xmw@cernet.edu.cn


   Zhongxing Ming
   Tsinghua University
   FIT Building  4-104, Tsinghua  University
   Beijing
   China

   EMail: mingzx@126.com











Ubillos, et al.          Expires March 13, 2011                [Page 14]


Internet-Draft             Name Based Sockets             September 2010


   Christian Vogt
   Ericsson
   200 Holger Way
   San Jose, CA  95134-1300
   USA

   EMail: christian.vogt@ericsson.com












































Ubillos, et al.          Expires March 13, 2011                [Page 15]