Internet Engineering Task Force J. Ubillos
Internet-Draft Swedish Institute of Computer
Intended status: Experimental Science
Expires: March 13, 2011 M. Xu
Z. Ming
Tsinghua University
C. Vogt
Ericsson
September 9, 2010
Name Based Sockets
draft-ubillos-name-based-sockets-02
Abstract
This memo defines the name based sockets. Name based sockets allow
the application developer to refer to remote hosts (and it self) by
name only, passing on all IP (locator) management to the operating
system. Applications are thus relieved of re-implementing features
such as multi-homing, mobility, NAT traversal, IPv6/IPv4
interoperability and address management in general. The operating
system can in turn re-use the same solutions for a whole set of guest
applications. Name based sockets aims to provide a whole set of
features without adding new indirection layers, new delays or other
dependences while maintaining transparent backwards compatibility.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 13, 2011.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
Ubillos, et al. Expires March 13, 2011 [Page 1]
Internet-Draft Name Based Sockets September 2010
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3
3.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3. Compatibility goals . . . . . . . . . . . . . . . . . . . 4
3.3.1. Network compatibility . . . . . . . . . . . . . . . . 4
3.3.2. Application compatibility . . . . . . . . . . . . . . 4
3.4. Name based sockets overview . . . . . . . . . . . . . . . 4
3.5. Protocol overview . . . . . . . . . . . . . . . . . . . . 4
4. Initial name exchange . . . . . . . . . . . . . . . . . . . . 5
4.1. Name format . . . . . . . . . . . . . . . . . . . . . . . 6
5. API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6. Mobility support . . . . . . . . . . . . . . . . . . . . . . . 8
6.1. Shim6 . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.1.1. Brief overview of changes . . . . . . . . . . . . . . 9
6.1.2. Identity change . . . . . . . . . . . . . . . . . . . 10
6.1.3. The hand-shake with name exchange . . . . . . . . . . 10
6.1.4. Triggers of shim6 . . . . . . . . . . . . . . . . . . 10
6.1.5. Establishing Shim6 context . . . . . . . . . . . . . . 11
6.1.6. Problems for Shim6 to support mobility . . . . . . . . 11
6.1.7. Changes to REAP . . . . . . . . . . . . . . . . . . . 13
7. Security Considerations . . . . . . . . . . . . . . . . . . . 13
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
10.2. Informative References . . . . . . . . . . . . . . . . . . 13
10.3. URL References . . . . . . . . . . . . . . . . . . . . . . 14
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 14
Appendix B. Open Issues . . . . . . . . . . . . . . . . . . . . . 14
Ubillos, et al. Expires March 13, 2011 [Page 2]
Internet-Draft Name Based Sockets September 2010
1. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Terminology
Locator - An IP address (v4 or v6) on which a host can be reached.
Multi-home - A host which is reachable through multiple locators (on
one interface or more)
Name - A character string (max 255 chars long) on which a host can be
identified. A name maps to zero or more locators.
3. Overview
3.1. Introduction
Name based sockets provide a unified interface which caters to the
application developers who wish to simply open up a communication to
a remote host by its name and have the operating system perform the
management of locators.
It does so by providing a new address family (AF_NAME) which allows
the developer to use a name instead of an IP.
3.2. Motivation
Network communication has for very long been based on the assumption
that applications should deal with the IP (locator) management. This
is based on the legacy notion that an IP does not change during a
session and that a session is a communication between two given IPs
(locators). This situation has changed, locators change during a
session, and a device/host might have multiple locators, hosts may be
behind NATs or be on different networks (IPv4/IPv6). Today, this is
mostly left to the individual applications to solve. This adds
complexity to the applications making it a challenge in it self just
to meet the networking demands of applications.
Name based sockets aims to fix this by pushing the locator management
to the operating system, without introducing any new limitations or
delays.
Note that Name based sockets do not aim to replace all socket() based
communications. There are of course cases which are limited due to
obvious boot-strapping problems. E.g. a DHCP client, or an DNS-
Ubillos, et al. Expires March 13, 2011 [Page 3]
Internet-Draft Name Based Sockets September 2010
querying client would do better in not using a name oriented
architecture.
3.3. Compatibility goals
3.3.1. Network compatibility
3.3.2. Application compatibility
3.4. Name based sockets overview
Name based sockets provide a new socket interface. It is implemented
using a new address family (AF_NAME). This means that the sockets
are only used by applications who explicitly invoke it. The result
is that applications that do use name based sockets are very aware of
the set of features they are provided with, hence they know not to
re-implement it. Current implementations are implemented as kernel
modules, however a user-space library implementation ought to work
just as well.
By using DNS as the ID/Locator mapping structure, we do not introduce
any new indirections. Please note that the responding host does not
need to do any DNS resolution (explained below). We part from the
assumption that most application network calls start with a FQDN.
The exchange of names is performed in-band, by piggy-backing the
needed information on the first couple of packets exchanged. The
consequence is:
o No extra delays
o No extra features until the name exchange has ended.
o As a result of this, we also achieve backwards compatibility (a
name exchange which never completes)
3.5. Protocol overview
Name based sockets work by performing a name exchange in the
beginning of a communication. It does this in an non-blocking way.
In practice what happens is that the first few packets are exchanged
in a normal legacy fashion. However, to these packets, extra
information about the corresponding hosts names are piggy-backed. If
a name exchange is successful, the extra features provided by name
based sockets are enabled. If the exchange does not succeed, normal
legacy communication continues unaffected.
The name of each host is either its FQDN, its IP in IPv6.arpa format
Ubillos, et al. Expires March 13, 2011 [Page 4]
Internet-Draft Name Based Sockets September 2010
or an arbitrary nonce. It may or may not be authenticated. In the
ordinary case, the name is not authenticated, thus the receiver does
not need to perform a reverse or forward lookup, hence not adding any
further delays to the first packet(s). The motivation for this is to
avoid any additional "first-packet" delays.
Once the name exchange has been performed successfully the complete
feature set will be made available to the communication
automatically.
The expected API for a socket using AF_NAME is the same as for e.g.
TCP (SOCK_STREAM). This is also the case for SOCK_DGRAM and similar
protocols, for all practical purposes, the functionality remains
unchanged, however, as a state is created in both ends, a connection
oriented model is more intuitive.
4. Initial name exchange
When the sender sends the first packet to the receiver it appends its
own name as an IP-Option/IPv6-Extension header. It repeats this for
a predefined amount of time or packets. On the receiving end, if the
receiver supports name based sockets it appends its own name in the
same fashion for a predefined amount of time or packets. Should the
receiver not be able to interpret the name, the option/extension
header is ignored and the legacy communication precedes as normal.
This kind of name exchange has two consequences. First and foremost
that there are no extra delays on the initial packets. Secondly that
the complete feature set provided by name based sockets will not be
available until a few packets have been exchanged.
In figure 1 the exchange is depicted. The first packet from the
sender has its own name appended to it. The next few packets sent
will also have the name appended to it for a predefined number of
packets (X in the figure) or until a reply including a name extension
is received. On the receiving end, should an incoming packet have a
name extension, the receiver begins to append its own name to the
sent packets. It does so for a predefined number of packets (X in
the figure).
Ubillos, et al. Expires March 13, 2011 [Page 5]
Internet-Draft Name Based Sockets September 2010
.--------. .----------.
| Sender | | Receiver |
'--------' '----------'
| |
| |
| .-------------------. |
| ------| Data packet |-----> | Packet received.
| |+ name piggybacked | | The first response
| '-------------------' | packet will have
| ----------> | the receiver name
| -------> | piggy backed to it.
| ----> .-----------------------. |
| -> | name piggybacked | |
| | until reply or X pkts | |
| '-----------------------' |
| |
| |
| |
| .-------------------. |
| <------| Data packet |----- |
| |+ name piggybacked | |
| '-------------------' |
| <---------- |
| <------- |
| .------------------. <---- |
| | name piggybacked | <- |
| | until X pkts | |
| '------------------' |
| |
| |
Figure 1
4.1. Name format
Names can be provided in any of three ways.
o FQDN. The Fully Qualified Domain Name of the host. This will
allow e.g. DNSsec to provide authenticity of the name.
o ip6.arpa. Using one of the hosts interfaces addresses as a name.
o Nonce. A one-use only session identifier.
Ubillos, et al. Expires March 13, 2011 [Page 6]
Internet-Draft Name Based Sockets September 2010
5. API
The API follows the connection oriented model, e.g. TCP. The
available calls are:
listen(): Prep for incoming session
fd = listen( local_name, peer_name, service, transport );
open(): Initiate outgoing session
fd = open( local_name, peer_name, service, transport );
accept(): Receive incoming session
accept( peer_name, fd );
read(): Receive data
data = read( fd );
write(): Send data
write( fd, data );
close(): Close session
close( fd );
Note: In the above examples
local_name The local identifier. Either an explicit name or a
wildcard (*). As host may have multiple names, to choose which to
listen to, a name must be chosen or a wildcard may be used. In
the latter case, on listening, any destination name is accepted;
on send, an arbitrary name (valid for the host) may automatically
be chosen by the socket.
peer_name The remote identifier. Either an explicit name or a
wildcard (*). In the accept() function, a wildcard might be
inserted. In this case, all incoming packets will be accepted,
independent of sender name.
service The service to be run. In general this will correspond to
the keyword field in the IANA Port Numbers registry. E.g. http,
ftp and ssh.
Ubillos, et al. Expires March 13, 2011 [Page 7]
Internet-Draft Name Based Sockets September 2010
transport Transport refers to the chosen transport protocol. This
could be e.g.
* TCP (SOCK_STREAM)
* UDP (SOCK_DGRAM)
* SCTP (SOCK_SCTP)
* DCCP (SOCK_DCCP)
* NORM ...
* And so on ...
Example code snippet
// Header files omitted
int main (int argc, const char *argv[]) {
int fd;
char rec_buf[100], snd_buf[100];
struct sockaddr_name name_sock;
fd = socket( AF_NAME, SOCK_STREAM, 0);
name_sock.sname_family = AF_NAME;
strcpy(name_sock.sname_addr.name, "my.host.name");
name_sock->sname_port = htons(5000);
bind(fd, (struct sockaddr *)&name_sock, sizeof(name_sock));
connect(fd, (struct sockaddr *)&name_sock, sizeof(name_sock));
write(fd, snd_buf, sizeof(snd_buf));
read(fd, snd_buf, sizeof(snd_buf));
close(fd);
}
6. Mobility support
6.1. Shim6
"... The Shim6 protocol, a layer 3 shim for providing locator
agility below the transport protocols, so that multihoming can be
provided for IPv6 with failover and load-sharing properties, without
assuming that a multihomed site will have a provider-independent IPv6
address prefix announced in the global IPv6 routing table. The hosts
Ubillos, et al. Expires March 13, 2011 [Page 8]
Internet-Draft Name Based Sockets September 2010
in a site that has multiple provider- allocated IPv6 address prefixes
will use the Shim6 protocol specified in this document to set up
state with peer hosts so that the state can later be used to failover
to a different locator pair, should the original one stop working. "
RFC5533 [RFC5533]
6.1.1. Brief overview of changes
To the upper layers, shim6 provides a stable IP address-like
identifier (ULID) to identify the remote host and make the IP
addresses (locators) transparent to the application. This way of
providing a pseudo-address (ULID) does however invite confusion. The
ULID selected by Shim6 is actually an IP address which is available
for the application when the connection is being established. This
address (ULID) may become invalid during the connection RFC5533
Section 1.5 [RFC5533]. ULID invalidation is beyond the control of
the individual hosts, it is controlled by the network. This might
cause confusion if the applications continues to use the ULIDs which
are no longer valid. Shim6s solution to this problem to terminate
the communication immediately when ever any ULID becomes invalid.
This is definitely inappropriate in a mobile scenario as connections
are expected to be preserved during the mobile period. Moving
between two distinct networks, changing your complete locator set is
the common scenario (e.g. entirely switching from one WiFi to
another.)
Name Based Sockets suggest using the name of a host as the
identifier. This solves the above problems, as a name is valid for
as long as a host wishes it to be. Also, as Name Based Sockets
provide a new explicit interface (names rather than 'fake IP
addresses'), applications that use it will be aware of the available
features, and may make correct assessments of the underlying IP stack
and its enhancements.
This section describes a set of changes and improvements to shim6
that are to be incorporated with the Name Based Sockets.
Briefly, the changes are:
o Name is used as ULID rather than IP (xref target="IdentityChange"
/>).
o Node inter-reachability resilience, for when both nodes are
simultaneously mobile (Section 6.1.6.1).
Ubillos, et al. Expires March 13, 2011 [Page 9]
Internet-Draft Name Based Sockets September 2010
6.1.2. Identity change
Shim6 selects a locator (IP) in the initial contact with the remote
peer and uses this locator as an upper-layer identifier (ULID). To
support NBS, we use name (or some structure related to name) as ULID
instead of IP addresses.
Because the end point identifier is no longer a locator but a name,
the initial name exchange is performed by NBS and Shim6 will further
use this name to construct the ULID.
Shim6 requires that any communication using a ULID MUST be terminated
when the ULID becomes invalid. Using names as ULIDs instead of IP
address is more in line with the transport semantic. Having names as
ULIDs means that the session may still exist even if both
communicating hosts' locator lists are empty at a given point of
time. This is particularly important when one or both peer(s) are
moving.
Note that replacing a ULID with a name does not mean representing the
ULID as a string or a string-like structure. In order to make least
modification to both Shim6 protocol (where ULID is a 128-bit IPv6
address) and its current implementation, we propose to represent the
name as a 128-bit MD5-hash and use this MD5-hash as the corresponding
ULID.
6.1.3. The hand-shake with name exchange
As is described in RFC5533 [RFC5533], Shim6 does not need to react
immediately when connections start up. The initial name exchange is
performed by NBS and it requires no help of Shim6. The name
exchanged by NBS will be further used as ULID by Shim6. At some time
during the communication, some heuristic may determine that it is
appropriate to use shim6 to support mobility/multi-homing, so the
communicating hosts initiate a 4-way, context-establishment exchange.
As a result, both hosts get a locator list of each other.
As an extension to Shim6, we do not change the operation sequence of
the 4-way exchange, namely the order of I1, R1, I2, R2 will not be
changed. What is changed is that the IP-based ULID is replaced by a
name-based ULID and the hand shake no longer requires ULID
negotiation because it has already been done by NBS.
6.1.4. Triggers of shim6
It is not necessarily worth paying the overhead of setting up a shim
context when e.g. only a small number of packets are exchanged
between two hosts. As a result, Shim6 functionality will not be
Ubillos, et al. Expires March 13, 2011 [Page 10]
Internet-Draft Name Based Sockets September 2010
started immediately as a new communication is initiated.
NBS uses some heuristic for determining when to perform a deferred
context establishment. This heuristic might be that more than 50
packets have been sent or received, or that there was a timer
expiration while active packet exchange was in place RFC 5533
[RFC5533]. How the heuristic is designed is beyond the scope of this
document.
6.1.5. Establishing Shim6 context
At a certain time during the connection, some heuristic on host A or
B (or both) determine that it is appropriate to pay the Shim6
overhead to improve host-to-host communication. This makes the Shim6
initiate the 4-way, context-establishment exchange RFC 5533
[RFC5533].
As a result, both A and B get a list of locators for each other. In
the case of name-based Shim6, ULID is represented as a MD5-hash of
name rather than IP.
6.1.6. Problems for Shim6 to support mobility
When only one host moves to a new network, a REAP Update is triggered
to prevent connection from being terminated. Under normal
circumstances, connection will be smoothly preserved during the REAP
Update process.
However, REAP itself is not sufficient to support full mobility
functionality, as when both hosts move simultaneously, neither of
them will receive the update message, which will lead to a connection
loss. To deal with this problem, DNS should be involved to provide
address information.
6.1.6.1. DNS querying
An effective solution for the mobility problem is to have a
"stationary infrastructure" to provide address information for all
mobile devices. We propose to use DNS as the stationary
infrastructure as it associates addresses with names and has enough
capability. How DNS incorporates with name-based Shim6 is described
in the following part.
6.1.6.2. One peer moves
In the case that only one host moves, the moving host starts a REAP
Update process to re-establish Shim6 context with the correspondent
host. At the same time, DNS should be updated by the moving host.
Ubillos, et al. Expires March 13, 2011 [Page 11]
Internet-Draft Name Based Sockets September 2010
This procedure is the normal REAP [RFC5534] procedure with the added
update to DNS.
The following sequence illustrates the details:
1. Two hosts, namely A and B are communicating using NBS and Shim6.
2. At certain moment, A moves to a new network and changes its IP
address (locator).
3. A updates the authoritative DNS with its new IP address. In
parallel, A starts the REAP Update process by sending B an Update
Request and the REAP Update process is invoked.
4. New operational locator pair is found by REAP Update process.
5. Handover process is completed and connection is preserved during
the process.
6.1.6.3. Both peers move
When both hosts moves simultaneously, neither host will receive the
REAP Update Request, thus REAP will fail in finding the new
operational locator pair. Under such circumstances, both hosts need
to query DNS for the correspondent hosts addresses. When new address
is retrieved, both hosts initiate REAP Update process as specified in
RFC5534 [RFC5534].
The following sequence illustrates the details:
1. Two hosts, namely A and B are communicating using NBS and Shim6.
2. At certain moment, both A and B move simultaneously and both
hosts change their respective IP addresses.
3. Both hosts update DNS with their new addresses and send REAP
Update Request to their correspondent peer.
4. Due to concurrent move, Update Requests are lost for both
directions.
5. Both hosts experience an Update Timeout and query DNS for
correspondent hosts' locators using their names.
6. New addresses are returned by the respective DNS queries and REAP
Update is now able to operate. A and B re-invoke a REAP Update
process using the new addresses.
Ubillos, et al. Expires March 13, 2011 [Page 12]
Internet-Draft Name Based Sockets September 2010
7. New operational locator pair is found by REAP Update process.
8. Handover process is completed and connection is preserved during
the handover process.
6.1.7. Changes to REAP
We extend REAP by adding DNS Querying into its Path Exploration. For
the sake of backwards compatibility, DNS can be implemented as a
separate module and has no impact on the other part of REAP which is
specified in RFC 5534. The DNS functionality can be turned off for
stationary hosts and be turned on for mobile devices.
In the case of mobile scenario, DNS Query and REAP Path Exploration
may work together to provide stronger reliability. DNS query might
be lost due to link failure or timeout due to high network delay.
Under such circumstances, REAP Path Exploration will be triggered
because of a SEND TIMEOUT and tries to find an available path. This
is meaningful when a host has multiple available interfaces (for
instance Wi-Fi and 3G) and the address change for one interface does
not lead to the change for others.
7. Security Considerations
8. IANA Considerations
Name based sockets requires a new address family (AF_NAME) to be
defined.
9. Contributors
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
10.2. Informative References
[RFC5533] Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming
Shim Protocol for IPv6", RFC 5533, June 2009.
[RFC5534] Arkko, J. and I. van Beijnum, "Failure Detection and
Locator Pair Exploration Protocol for IPv6 Multihoming",
RFC 5534, June 2009.
Ubillos, et al. Expires March 13, 2011 [Page 13]
Internet-Draft Name Based Sockets September 2010
10.3. URL References
Appendix A. Change Log
Note to RFC Editor: if this document does not obsolete an existing
RFC, please remove this appendix before publication as an RFC.
Appendix B. Open Issues
Note to RFC Editor: please remove this appendix before publication as
an RFC.
Authors' Addresses
Javier Ubillos
Swedish Institute of Computer Science
Kistagangen 16
Kista
Sweden
Phone: +46767647588
EMail: jav@sics.se
Mingwei Xu
Tsinghua University
FIT Building 4-104, Tsinghua University
Beijing
China
EMail: xmw@cernet.edu.cn
Zhongxing Ming
Tsinghua University
FIT Building 4-104, Tsinghua University
Beijing
China
EMail: mingzx@126.com
Ubillos, et al. Expires March 13, 2011 [Page 14]
Internet-Draft Name Based Sockets September 2010
Christian Vogt
Ericsson
200 Holger Way
San Jose, CA 95134-1300
USA
EMail: christian.vogt@ericsson.com
Ubillos, et al. Expires March 13, 2011 [Page 15]