Network Working Group R. R. Stewart
INTERNET-DRAFT Cisco
Q. Xie
L Yarroll
Motorola
J. Wood
K. Poon
Sun Microsystems
K. Fujita
NEC
expires in six months June 1, 2001
SCTP Sockets Mapping
<draft-ietf-tsvwg-sctpsocket-00.txt>
Status of This Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of [RFC2026]. Internet-Drafts are
working documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document describes a mapping of the Stream Control Transmission
Protocol [SCTP] into a sockets API. The benefits of this mapping
include compatibility for TCP applications, access to new SCTP
features and a consolidated error and event notification scheme.
Table of Contents
1. Introduction............................................ 3
2. Conventions............................................. 4
2.1 Data Types............................................ 4
3. UDP-style Interface..................................... 4
3.1 Basic Operation....................................... 4
3.1.1 socket() - UDP Style Syntax...................... 5
3.1.2 bind() - UDP Style Syntax........................ 5
3.1.3 sendmsg() and recvmsg() - UDP Style Syntax....... 6
3.1.4 close() - UDP Style Syntax....................... 7
3.2 Implicit Association Setup............................ 8
3.3 Non-blocking mode..................................... 8
4. TCP-style Interface..................................... 9
4.1 Basic Operation....................................... 9
4.1.1 socket() - TCP Style Syntax........................10
4.1.2 bind() - TCP Style Syntax..........................10
4.1.3 listen() - TCP Style Syntax........................11
Stewart et.al. [Page 1]
Internet Draft SCTP Sockets Mapping June 2001
4.1.4 accept() - TCP Style Syntax........................11
4.1.5 connect() - TCP Style Syntax.......................12
4.1.6 close() - TCP Style Syntax.........................12
4.1.7 shutdown() - TCP Style Syntax......................12
4.1.8 sendmsg() and recvmsg() - TCP Style Syntax.........13
5. Data Structures..........................................13
5.1 The msghdr and cmsghdr Structures......................13
5.2 SCTP msg_control Structures............................14
5.2.1 SCTP Initiation Structure (SCTP_INIT)...............15
5.2.2 SCTP Header Information Structure (SCTP_SNDRCV).....16
5.3 SCTP Events and Notifications..........................18
5.3.1 SCTP Notification Structure.........................18
5.3.1.1 SCTP_ASSOC_CHANGE................................19
5.3.1.2 SCTP_PEER_ADDR_CHANGE............................21
5.3.1.3 SCTP_REMOTE_ERROR................................22
5.3.1.4 SCTP_SEND_FAILE..................................23
5.3.1.5 SCTP_SHUTDOWN_EVENT..............................24
5.4 Ancillary Data Considerations and Semantics...........25
5.4.1 Multiple Items and Ordering........................25
5.4.2 Accessing and Manipulating Ancillary Data..........25
5.4.3 Control Message Buffer Sizing......................26
6. Common Operations for Both Styles.......................27
6.1 send(), recv(), sendto(), recvfrom()..................27
6.2 setsockopt(), getsockopt()............................28
6.3 read() and write()....................................28
7. Socket Options..........................................28
7.1 Read / Write Options..................................29
7.1.1 Retransmission Timeout Parameters (SCTP_RTOINFO)...29
7.1.2 Association Retransmission Parameter
(SCTP_ASSOCRTXINFO)................................29
7.1.3 Initialization Parameters (SCTP_INITMSG)...........30
7.1.4 SO_LINGER..........................................30
7.1.5 SO_NODELAY.........................................31
7.1.6 SO_RCVBUF..........................................31
7.1.7 SO_SNDBUF..........................................31
7.1.8 Automatic Close of associations (SCTP_AUTOCLOSE)...31
7.2 Read-Only Options.....................................31
7.2.1 Association Status (SCTP_STATUS)...................31
7.3. Ancillary Data Interest Options.....................32
8. New Interface...........................................33
8.1 sctp_bindx()..........................................33
8.2 Branched-off Association, sctp_peeloff()..............34
8.3 sctp_getpaddrs()......................................35
8.4 sctp_freepaddrs().....................................35
8.5 sctp_opt_info().......................................35
8.5.1 Peer Address Parameters............................36
8.5.2 Peer Address Information...........................37
9. Security Considerations.................................37
10. Authors' Addresses....................................38
11. References............................................38
Appendix A: TCP-style Code Example.........................39
Appendix B: UDP-style Code Example.........................43
Stewart et.al. [Page 2]
Internet Draft SCTP Sockets Mapping June 2001
1. Introduction
The sockets API has provided a standard mapping of the Internet
Protocol suite to many operating systems. Both TCP [TCP] and UDP
[UDP] have benefited from this standard representation and access
method across many diverse platforms. SCTP is a new protocol that
provides many of the characteristics of TCP but also incorporates
semantics more akin to UDP. This document defines a method to map
the existing sockets API for use with SCTP, providing both a base
for access to new features and compatibility so that most existing
TCP applications can be migrated to SCTP with few (if any) changes.
There are three basic design objectives:
1) Maintain consistency with existing sockets APIs:
We define a sockets mapping for SCTP that is consistent with other
sockets API protocol mappings (for instance, UDP, TCP, IPv4, and
IPv6).
2) Support a UDP-style interface
This set of semantics is similar to that defined for conntionless
protocols, such as UDP. It is more efficient than a TCP-like
connection-oriented interface in terms of exploring the new features
of SCTP.
Note that SCTP is connection-oriented in nature, and it does not
support broadcast or multicast communications, as UDP does.
3) Support a TCP-style interface
This interface supports the same basic semantics as sockets for
connection-oriented protocols, such as TCP.
The purpose of defining this interface is to allow existing
applications built on connnection-oriented protocols be ported to
use SCTP with very little effort, and developers familiar with those
semantics can easily adapt to SCTP.
Extensions will be added to this mapping to provide mechanisms to
exploit new features of SCTP.
Goals 2 and 3 are not compatible, so in this document we define two
modes of mapping, namely the UDP-style mapping and the TCP-style
mapping. These two modes share some common data structures and
operations, but will require the use of two different programming
models.
A mechanism is defined to convert a UDP-style SCTP socket into a
TCP-style socket.
Some of the SCTP mechanisms cannot be adequately mapped to existing
socket interface. In some cases, it is more desirable to have new
interface instead of using exisitng socket calls. This document
also describes those new interface.
Stewart et.al. [Page 3]
Internet Draft SCTP Sockets Mapping June 2001
2. Conventions
2.1 Data Types
Whenever possible, data types from Draft 6.6 (March 1997) of POSIX
1003.1g are used: uintN_t means an unsigned integer of exactly N
bits (e.g., uint16_t). We also assume the argument data types from
1003.1g when possible (e.g., the final argument to setsockopt() is a
size_t value). Whenever buffer sizes are specified, the POSIX
1003.1 size_t data type is used.
3. UDP-style Interface
The UDP-style interface has the following characteristics:
A) Outbound association setup is implicit.
B) Messages are delivered in complete messages (with one notable
exception).
C) New inbound associations are accepted automatically.
3.1 Basic Operation
A typical server in this model uses the following socket calls in
sequence to prepare an endpoint for servicing requests:
1. socket()
2. bind()
3. setsocketopt()
4. recvmsg()
5. sendmsg()
6. close()
A typical client uses the following calls in sequence to setup an
association with a server to request services:
1. socket()
2. sendmsg()
3. recvmsg()
4. close()
In this model, by default, all the associations connected to the
endpoint are represented with a single socket.
If the server or client wishes to branch an existing association off
to a separate socket, it is required to call sctp_peeloff() and in
the parameter specifies one of the transport addresses of the
association. The sctp_peeloff() call will return a new socket which
can then be used with recv() and send() functions for message
passing. See Section 8.2 for more on branched-off associations.
Once an association is branched off to a separate socket, it becomes
Stewart et.al. [Page 4]
Internet Draft SCTP Sockets Mapping June 2001
completely separated from the original socket. All subsequent
control and data operations to that association must be done through
the new socket. For example, the close operation on the original
socket will not terminate any associations that have been branched
off to a different socket.
We will discuss the UDP-style socket calls in more details in the
following subsections.
3.1.1 socket() - UDP Style Syntax
Applications use socket() to create a socket descriptor to represent
an SCTP endpoint.
The syntax is,
sd = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);
or,
sd = socket(PF_INET6, SOCK_SEQPACKET, IPPROTO_SCTP);
Here, SOCK_SEQPACKET indicates the creation of a UDP-style socket.
The first form creates an endpoint which can use only IPv4
addresses, while, the second form creates an endpoint which can use
both IPv6 and IPv4 mapped addresses.
3.1.2 bind() - UDP Style Syntax
Applications use bind() to specify which local address the SCTP
endpoint should associate itself with as the primary address.
An SCTP endpoint can be associated with multiple addresses. To do
this, sctp_bindx() is introduced in section 8.1 to help applications
do the job of associating multiple addresses.
These addresses associated with a socket are the eligible transport
addresses for the endpoint to send and receive data. The endpoint
will also present these addresses to its peers during the
association initialization process, see [SCTP].
After calling bind() or sctp_bindx(), if the endpoint wishes to
accept new assocations on the socket, it must enable the
SCTP_ASSOC_CHANGE socket option (see section 5.3.1.1). Then the
SCTP endpoint will accept all SCTP INIT requests passing the
COMMUNICATION_UP notification to the endpoint upon reception of a
valid associaition (i.e. the receipt of a valid COOKIE ECHO).
The syntax of bind() is,
ret = bind(int sd, struct sockaddr *addr, int addrlen);
sd - the socket descriptor returned by socket().
Stewart et.al. [Page 5]
Internet Draft SCTP Sockets Mapping June 2001
addr - the address structure (struct sockaddr_in or struct
sockaddr_in6 [RFC 2553]),
addrlen - the size of the address structure.
If sd is an IPv4 socket, the address passed must be an IPv4 address.
If the sd is an IPv6 socket, the address passed can either be an
IPv4 or an IPv6 address.
Applications cannot call bind() multiple times to associate multiple
addresses to an endpoint. After the first call to bind(), all
subsequent call will return an error.
If addr is specified as INADDR_ANY for an IPv4 or IPv6 socket, or as
IN6ADDR_ANY for an IPv6 socket (normally used by server
applications), the operating system will associates the endpoint
with the optimal subset of available local interfaces.
If a bind() or sctp_bindx() is not called prior to the connect()
call, the system picks an ephemeral port and will choose an address
set equivalant to binding with INADDR_ANY and IN6ADDR_ANY for IPv4
and IPv6 socket respectively. One of those addresses will be the
primary address for the association. This automatically enables the
multihoming capability of SCTP.
3.1.3 sendmsg() and recvmsg() - UDP Style Syntax
An application uses sendmsg() and recvmsg() call to transmit data to
and receive data from its peer.
ssize_t sendmsg(int socket, const struct msghdr *message,
int flags);
ssize_t recvmsg(int socket, struct msghdr *message,
int flags);
socket - the socket descriptor of the endpoint.
message - pointer to the msghdr structure which contains a single
user message and possibly some ancillary data.
See Section 5 for complete description of the data
structures.
flags - No new flags are defined for SCTP at this level. See
Section 5 for SCTP-specific flags used in the msghdr
structure.
As we will see in Section 5, along with the user data, the ancillary
data field is used to carry the sctp_sndrcvinfo and/or the
sctp_initmsg structures to perform various SCTP functions including
specifying options for sending each user message. Those options,
depending on whether sending or receiving, include stream number,
stream sequence number, TOS, various flags, context and payload
protocol Id, etc.
Stewart et.al. [Page 6]
Internet Draft SCTP Sockets Mapping June 2001
When sending user data with sendmsg(), the msg_name field in msghdr
structure will be filled with one of the transport addresses of the
intended receiver. If there is no association existing between the
sender and the intended receiver, the sender's SCTP stack will set
up a new association and then send the user data (see Section 3.2
for more on implicit association setup).
If a peer sends a SHUTDOWN, a SCTP_SHUTDOWN_EVENT notification will
be delivered if that notification has been enabled, and no more data
can be sent to that association. Any attempt to send more data will
cause sendmsg() to return with an ESHUTDOWN error. Note that the
socket is still open for reading at this point so it is possible to
retrieve notifications.
When receiving a user message with recvmsg(), the msg_name field in
msghdr structure will be populated with the source transport address
of the user data. The caller of recvmsg() can use this address
information to determine to which association the received user
message belongs.
If all data in a single message has been delivered, MSG_EOR will be
set in the msg_flags field of the msghdr structure (see section
5.1).
If the application does not provide enough buffer space to
completely receive a data message, MSG_EOR will not be set in
msg_flags. Successive reads will consume more of the same message
until the entire message has been delievered, and MSG_EOR will be
set.
If the SCTP stack is running low on buffers, it may partially
deliver a message. In this case, MSG_EOR will not be set, and more
calls to recvmsg() will be necessary to completely consume the
message. Only one message at a time can be partially delivered.
Note, if the socket is a branched-off socket that only represents
one association (see Section 3.1), the msg_name field is not used
when sending data (i.e., ignored by the SCTP stack).
3.1.4 close() - UDP Style Syntax
Applications use close() to perform graceful shutdown (as described
in Section 10.1 of [SCTP]) on ALL the associations currently
represented by a UDP-style socket.
The syntax is
ret = close(int sd);
sd - the socket descriptor of the associations to be closed.
To gracefully shutdown a specific association represented by the
UDP-style socket, an application should use the sendmsg() call,
passing no user data, but including the appropriate flag in the
Stewart et.al. [Page 7]
Internet Draft SCTP Sockets Mapping June 2001
ancillary data (see Section 5.2.2).
If sd in the close() call is a branched-off socket representing only
one association, the shutdown is performed on that association only.
3.2 Implicit Association Setup
Once all bind() calls are complete on a UDP-style socket, the
application can begin sending and receiving data using the
sendmsg()/recvmsg() or sendto()/recvfrom() calls, without going
through any explicit association setup procedures (i.e., no
connect() calls required).
Whenever sendmsg() or sendto() is called and the SCTP stack at the
sender finds that there is no association existing between the
sender and the intended receiver (identified by the address passed
either in the msg_name field of msghdr structure in the sendmsg()
call or the dest_addr field in the sendto() call), the SCTP stack
will automatically setup an association to the intended receiver.
Upon the successful association setup a COMMUNICATION_UP
notification will be dispatched to the socket at both the sender and
receiver side. This notification can be read by the recvmsg() system
call (see Section 3.1.3).
Note, if the SCTP stack at the sender side supports bundling, the
first user message may be bundled with the COOKIE ECHO message
[SCTP].
When the SCTP stack sets up a new association implicitly, it first
consults the sctp_initmsg structure, which is passed along within
the ancillary data in the sendmsg() call (see Section 5.2.1 for
details of the data structures), for any special options to be used
on the new association.
If this information is not present in the sendmsg() call, or if the
implicit association setup is triggered by a sendto() call, the
default association initialization parameters will be used. These
default association parameters may be set with respective
setsockopt() calls or be left to the system defaults.
Implicit association setup cannot be initiated by send()/recv()
calls.
3.3 Non-blocking mode
Some SCTP user might want to avoid blocking when they call
socket interface function.
Whenever the user which want to avoid blocking must call select()
before calling sendmsg()/sendto() and recvmsg()/recvfrom(), and
check the socket status is writable or readable. If the socket
status isn't writeable or readable, the user should not call
sendmsg()/sendto() and recvmsg()/recvfrom().
Stewart et.al. [Page 8]
Internet Draft SCTP Sockets Mapping June 2001
Once all bind() calls are complete on a UDP-style socket, the
application must set the non-blocking option by a fcntl() (such as
O_NONBLOCK). After which the sendmsg() function returns
immediately, and the success or fault of the data message (and
possible SCTP_INITMSG parameters) will be notified by
SCTP_ASSOC_CHANGE with COMMUNICATION_UP or CANT_START_ASSOC. If user
data was sent and failed (due to a CANT_START_ASOC), the sender will
also recieve a SCTP_SEND_FAILED event. Those event(s) can be
received by the user calling of recvmsg(). The server side user is
also notified of an association up event by the reception of a
SCTP_ASSOC_CHANGE with COMMUNICATION_UP via the calling of
recvmsg() and possibly the reception of the first data message.
When the user want to graceful shutdown the association, the user
must call sendmsg() and send SHUTDOWN. The function returns
immediately, and the success of the SHUTDOWN is notified by
SCTP_ASSOC_CHANGE with SHUTDOWN_COMPLETE calling recvmsg().
4. TCP-style Interface
The goal of this model is to follow as closely as possible the
current practice of using the sockets interface for a connection
oriented protocol, such as TCP. This model enables existing
applications using connection oriented protocols to be ported to
SCTP with very little effort.
Note that some new SCTP features and some new SCTP socket options
can only be utilized through the use of sendmsg() and recvmsg()
calls, see Section 4.1.8.
4.1 Basic Operation
A typical server in TCP-style model uses the following system call
sequence to prepare an SCTP endpoint for servicing requests:
1. socket()
2. bind()
3. listen()
4. accept()
The accept() call blocks until a new assocation is set up. It
returns with a new socket descriptor. The server then uses the new
socket descriptor to communicate with the client, using recv() and
send() calls to get requests and send back responses.
Then it calls
5. close()
to terminate the association.
A typical client uses the following system call sequence to setup an
association with a server to request services:
Stewart et.al. [Page 9]
Internet Draft SCTP Sockets Mapping June 2001
1. socket()
2. connect()
After returning from connect(), the client uses send() and recv()
calls to send out requests and receive responses from the server.
The client calls
3. close()
to terminate this association when done.
4.1.1 socket() - TCP Style Syntax
Applications calls socket() to create a socket descriptor to
represent an SCTP endpoint.
The syntax is:
sd = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP);
or,
sd = socket(PF_INET6, SOCK_STREAM, IPPROTO_SCTP);
Here, SOCK_STREAM indicates the creation of a TCP-style socket.
The first form creates an endpoint which can use only IPv4
addresses, while the second form creates an endpoint which can use
both IPv6 and mapped IPv4 addresses.
4.1.2 bind() - TCP Style Syntax
Applications use bind() to pass the primary address assoicated with
an SCTP endpoint to the system. An SCTP endpoint can be associated
with multiple addresses. To do this, sctp_bindx() is introduced in
section 8.1 to help applications do the job of associating multiple
addresses.
These addresses associated with a socket are the eligible transport
addresses for the endpoint to send and receive data. The endpoint
will also present these addresses to its peers during the
association initialization process, see [SCTP].
The syntax is:
ret = bind(int sd, struct sockaddr *addr, int addrlen);
sd - the socket descriptor returned by socket() call.
addr - the address structure (either struct sockaddr_in or struct
sockaddr_in6 defined in [RFC 2553]).
addrlen - the size of the address structure.
Stewart et.al. [Page 10]
Internet Draft SCTP Sockets Mapping June 2001
If sd is an IPv4 socket, the address passed must be an IPv4 address.
Otherwise, i.e., the sd is an IPv6 socket, the address passed can
either be an IPv4 or an IPv6 address.
Applications cannot call bind() multiple times to associate multiple
addresses to the endpoint. After the first call to bind(), all
subsequent calls will return an error.
If addr is specified as INADDR_ANY for an IPv4 or IPv6 socket, or as
IN6ADDR_ANY for an IPv6 socket (normally used by server
applications), the operating system will associate the endpoint with
an optimal address set of the available interfaces.
The completion of this bind() process does not ready the SCTP
endpoint to accept inbound SCTP association requests. Until a
listen() system call, described below, is performed on the socket,
the SCTP endpoint will promptly reject an inbound SCTP INIT request
with an SCTP ABORT.
4.1.3 listen() - TCP Style Syntax
Applications use listen() to ready the SCTP endpoint for accepting
inbound associations.
The syntax is:
ret = listen(int sd, int backlog);
sd - the socket descriptor of the SCTP endpoint.
backlog - this specifies the max number of outstanding associations
allowed in the socket's accept queue. These are the
associations that have finished the four-way initiation
handshake (see Section 5 of [SCTP]) and are in the
ESTABLISHED state.
4.1.4 accept() - TCP Style Syntax
Applications use accept() call to remove an established SCTP
assocation from the accept queue of the endpoint. A new socket
descriptor will be returned from accept() to represent the newly
formed association.
The syntax is:
new_sd = accept(int sd, struct sockaddr *addr, socklen_t *addrlen);
new_sd - the socket descriptor for the newly formed association.
sd - the listening socket descriptor.
addr - on return, will contain the primary address of the peer
endpoint.
addrlen - on return, will contain the size of addr.
Stewart et.al. [Page 11]
Internet Draft SCTP Sockets Mapping June 2001
4.1.5 connect() - TCP Style Syntax
Applications use connect() to initiate an association to a peer.
The syntax is
ret = connect(int sd, const struct sockaddr *addr, int addrlen);
sd - the socket descriptor of the endpoint.
addr - the peer's address.
addrlen - the size of the address.
This operation corresponds to the ASSOCIATE primitive described in
section 10.1 of [SCTP].
By default, the new association created has only one outbound
stream. The SCTP_INITMSG option described in Section 7.1.4 should be
used before connecting to change the number of outbound streams.
If a bind() or sctp_bindx() is not called prior to the connect()
call, the system picks an ephemeral port and will choose an address
set equivalant to binding with INADDR_ANY and IN6ADDR_ANY for IPv4
and IPv6 socket respectively. One of those addresses will be the
primary address for the association. This automatically enables the
multihoming capability of SCTP.
Note that SCTP allows data exchange, similar to T/TCP [RFC1644],
during the association set up phase. If an application wants to do
this, it cannot use connect() call. Instead, it should use sendto()
or sendmsg() to initiate an assocation. If it uses sendto() and it
wants to change initialization behavior, it needs to use the
SCTP_INITMSG socket option before calling sendto(). Or it can use
SCTP_INIT type sendmsg() to initiate an association without doing
the setsockopt().
SCTP does not support half close semantics. This means that unlike
T/TCP, MSG_EOF should not be set in the flags parameter when calling
sendto() or sendmsg() when the call is used to initiate a
connection. MSG_EOF is not an acceptable flag with SCTP socket.
4.1.6 close() - TCP Style Syntax
Applications use close() to gracefully close down an association.
The syntax is:
ret = close(int sd);
sd - the socket descriptor of the association to be closed.
After an application calls close() on a socket descriptor, no
further socket operations will suceed on that descriptor.
4.1.7 shutdown() - TCP Style Syntax
The socket call shutdown() does not have any meaning with an SCTP
Stewart et.al. [Page 12]
Internet Draft SCTP Sockets Mapping June 2001
socket because SCTP does not have a half closed semantics. Calling
shutdown() on an SCTP socket will return an error.
To perform the ABORT operation described in [SCTP] section 10.1, an
application can use the socket option SO_LINGER. It is described in
section 7.1.6.
4.1.8 sendmsg() and recvmsg() - TCP Style Syntax
With a TCP-style socket, the application can also use sendmsg() and
recvmsg() to transmit data to and receive data from its peer. The
semantics is similar to those used in the UDP-style model (section
3.1.3), with the following differences:
1) When sending, the msg_name field in the msghdr is not used to
specify the intended receiver, rather it is used to indicate a
different peer address if the sender does not want to send the
message over the primary address of the receiver. If the transport
address given is not part of the current association, the data will
not be sent and a SCTP_SEND_FAILED event will be delivered to the
application if send failure events are enabled.
When receiving, if a message is not received from the primary
address, the SCTP stack will fill in the msg_name field on return so
that the application can retrieve the source address information of
the received message.
2) An application must use close() to gracefully shutdown an
assocication, or use SO_LINGER option with close() to abort an
asssociation. It must not use the MSG_ABORT or MSG_EOF flag in
sendmsg(). The system returns an error if an application tries to
do so.
5. Data Structures
We discuss in this section important data structures which are
specific to SCTP and are used with sendmsg() and recvmsg() calls to
control SCTP endpoint operations and to access ancillary
information.
5.1 The msghdr and cmsghdr Structures
The msghdr structure used in the sendmsg() and recvmsg() calls, as
well as the ancillary data carried in the structure, is the key for
the application to set and get various control information from the
SCTP endpoint.
The msghdr and the related cmsghdr structures are defined and
discussed in details in [RFC2292]. Here we will cite their
definitions from [RFC2292].
The msghdr structure:
struct msghdr {
Stewart et.al. [Page 13]
Internet Draft SCTP Sockets Mapping June 2001
void *msg_name; /* ptr to socket address structure */
socklen_t msg_namelen; /* size of socket address structure */
struct iovec *msg_iov; /* scatter/gather array */
size_t msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* ancillary data */
socklen_t msg_controllen; /* ancillary data buffer length */
int msg_flags; /* flags on received message */
};
The cmsghdr structure:
struct cmsghdr {
socklen_t cmsg_len; /* #bytes, including this header */
int cmsg_level; /* originating protocol */
int cmsg_type; /* protocol-specific type */
/* followed by unsigned char cmsg_data[]; */
};
In the msghdr structure, the usage of msg_name has been discussed in
previous sections (see Sections 3.1.3 and 4.1.8).
The scatter/gather buffers, or I/O vectors (pointed to by the
msg_iov field) are treated as a single SCTP data chunk, rather than
multiple chunks, for both sendmsg() and recvmsg().
The msg_flags are not used when sending a message with sendmsg().
If a notification has arrived, recvmsg() will return the
notification with the MSG_NOTIFICATION flag set in msg_flags. If the
MSG_NOTIFICATION flag is not set, recvmsg() will return data. See
section 5.3 for more information about notifications.
If all portions of a data frame or notification have been read,
recvmsg() will return with MSG_EOR set in msg_flags.
5.2 SCTP msg_control Structures
A key element of all SCTP-specific socket extensions is the use of
ancillary data to specify and access SCTP-specific data via the
struct msghdr's msg_control member used in sendmsg() and recvmsg().
Fine-grained control over initialization and sending parameters are
handled with ancillary data.
Each ancillary data item is preceeded by a struct cmsghdr (see
Section 5.1), which defines the function and purpose of the data
contained in in the cmsg_data[] member.
There are two kinds of ancillary data: initialization data, and,
header information (SNDRCV). Initialization data (UDP-style only)
sets protocol parameters for new associations. Section 5.2.1
provides more details. Header information can set or report
parameters on individual messages in a stream. See section 5.2.2
for how to use SNDRCV ancillary data.
Stewart et.al. [Page 14]
Internet Draft SCTP Sockets Mapping June 2001
By default on a TCP-style socket, SCTP will pass no ancillary data;
on a UDP-style socket, SCTP will only pass SCTP_SNDRCV information.
Specific ancillary data items can be enabled with socket options
defined for SCTP; see section 7.3. Note in particular that for
UDP-style sockets, new associations will not be accepted by
default. See section 5.2.1 for more information.
Note that all ancillary types are fixed length; see section 5.4 for
further discussion on this. These data structures use struct
sockaddr_storage (defined in [RFC2553]) as a portable, fixed length
address format.
Other protocols may also provide ancillary data to the socket layer
consumer. These ancillary data items from other protocols may
intermingle with SCTP data. For example, the IPv6 socket API
definitions ([RFC2292] and [RFC2553]) define a number of ancillary
data items. If a socket API consumer enables delivery of both SCTP
and IPv6 ancillary data, they both may appear in the same
msg_control buffer in any order. An application may thus need to
handle other types of ancillary data besides that passed by SCTP.
The sockets application must provide a buffer large enough to
accomodate all ancillary data provided via recvmsg(). If the buffer
is not large enough, the ancillary data will be truncated and the
msghdr's msg_flags will include MSG_CTRUNC.
5.2.1 SCTP Initiation Structure (SCTP_INIT)
This cmsghdr structure provides information for initializing new
SCTP associations with sendmsg(). The SCTP_INITMSG socket option
uses this same data structure. This structure is not used for
recvmsg().
cmsg_level cmsg_type cmsg_data[]
------------ ------------ ----------------------
IPPROTO_SCTP SCTP_INIT struct sctp_initmsg
Here is the definition of the sctp_initmsg structure:
struct sctp_initmsg {
uint16_t sinit_num_ostreams;
uint16_t sinit_max_instreams;
uint16_t sinit_max_attempts;
uint16_t sinit_max_init_timeo;
};
sinit_num_ostreams: 16 bits (unsigned integer)
This is an integer number representing the number of streams that
the application wishes to be able to send to. This number is
confirmed in the COMMUNICATION_UP notification and must be verified
since it is a negotiated number with the remote endpoint. The
default value of 0 indicates to use the endpoint default value.
Stewart et.al. [Page 15]
Internet Draft SCTP Sockets Mapping June 2001
sinit_max_instreams: 16 bits (unsigned integer)
This value represents the maximum number of inbound streams the
application is prepared to support. This value is bounded by the
actual implementation. In other words the user MAY be able to
support more streams than the Operating System. In such a case, the
Operating System limit overrides the value requested by the
user. The default value of 0 indicates to use the endpoint's default
value.
sinit_max_attempts: 16 bits (unsigned integer)
This integer specifies how many attempts the SCTP endpoint should
make at resending the INIT. This value overrides the system SCTP
'Max.Init.Retransmits' value. The default value of 0 indicates to
use the endpoint's default value. This is normally set to the
system's default 'Max.Init.Retransmit' value.
sinit_max_init_timeo: 16 bits (unsigned integer)
This value represents the largest Time-Out or RTO value to use in
attempting a INIT. Normally the 'RTO.Max' is used to limit the
doubling of the RTO upon timeout. For the INIT message this value
MAY override 'RTO.Max'. This value MUST NOT influence 'RTO.Max'
during data transmission and is only used to bound the initial setup
time. A default value of 0 indicates to use the endpoint's default
value. This is normally set to the system's 'RTO.Max' value (60
seconds).
5.2.2 SCTP Header Information Structure (SCTP_SNDRCV)
This cmsghdr structure specifies SCTP options for sendmsg() and
describes SCTP header information about a received message through
recvmsg().
cmsg_level cmsg_type cmsg_data[]
------------ ------------ ----------------------
IPPROTO_SCTP SCTP_SNDRCV struct sctp_sndrcvinfo
Here is the defintion of sctp_sndrcvinfo:
struct sctp_sndrcvinfo {
uint16_t sinfo_stream;
uint16_t sinfo_ssn;
uint16_t sinfo_flags;
uint32_t sinfo_ppid;
uint32_t sinfo_context;
uint8_t sinfo_dscp;
sctp_assoc_t sinfo_assoc_id;
};
sinfo_stream: 16 bits (unsigned integer)
For recvmsg() the SCTP stack places the message's stream number in
Stewart et.al. [Page 16]
Internet Draft SCTP Sockets Mapping June 2001
this value. For sendmsg() this value holds the stream number that
the application wishes to send this message to. If a sender
specifies an invalid stream number an error indication is returned
and the call fails.
sinfo_ssn: 16 bits (unsigned integer)
For recvmsg() this value contains the stream sequence number that
the remote endpoint placed in the DATA chunk. For fragmented
messages this is the same number for all deliveries of the message
(if more than one recvmsg() is needed to read the message). The
sendmsg() call will ignore this parameter.
sinfo_ppid:32 bits (unsigned integer)
This value in sendmsg() is an opaque unsigned value that is passed
to the remote end in each user message. In recvmsg() this value is
the same information that was passed by the upper layer in the peer
application. Please note that byte order issues are NOT accounted
for and this information is passed opaquely by the SCTP stack from
one end to the other.
sinfo_context:32 bits (unsigned integer)
This value is an opaque 32 bit context datum that is used in the
sendmsg() function. This value is passed back to the upper layer if
a error occurs on the send of a message and is retrieved with each
unsent message (Note: if a endpoint has done multple sends, all of
which fail, multiple different sinfo_context values will be
returned. One with each user data message).
sinfo_flags: 16 bits (unsigned integer)
This field may contain any of the following flags and is composed of
a bitwise OR of these values.
recvmsg() flags:
MSG_UNORDERED - This flag is present when the message was sent
non-ordered.
sendmsg() flags:
MSG_UNORDERED - This flag requests the un-ordered delivery of the
message. If this flag is clear the datagram is
considered an ordered send.
MSG_ADDR_OVER - This flag, in the UDP model, requests the SCTP
stack to override the primary destination address
with the address found with the sendto/sendmsg
call.
MSG_ABORT - Setting this flag causes the specified association
to abort by sending an ABORT message to the peer
Stewart et.al. [Page 17]
Internet Draft SCTP Sockets Mapping June 2001
(UDP-style only).
MSG_EOF - Setting this flag invokes the SCTP graceful shutdown
procedures which assure that all data enqueued by
both endpoints are successfully transmitted before
closing the association (UDP-style only).
sinfo_dscp: 8 bits (unsigned integer)
This field is available to change the DSCP value in the outbound IP
packet (hence it is used only from sendmsg()). The default value of
this field is 0. Note only 6 bits of this byte are used, the upper 2
bits are not part of the DS field. Any setting within these upper 2
bits is ignored.
sinfo_assoc_id: sizeof (sctp_assoc_t)
The association handle field, sinfo_assoc_id, holds the identifier
for the association announced in the COMMUNICATION_UP notification.
All notifications for a given association have the same identifier.
A sctp_sndrcvinfo item always corresponds to the data in msg_iov.
5.3 SCTP Events and Notifications
An SCTP application may need to understand and process events and
errors that happen on the SCTP stack. These events include network
status changes, association startups, remote operational errors and
undeliverable messages. All of these can be essential for the
application.
When an SCTP application layer does a recvmsg() the message read is
normally a data message from a peer endpoint. If the application
wishes to have the SCTP stack deliver notifications of non-data
events, it sets the appropriate socket option for the notifications
it wants. See section 7.3 for these socket options. When a
notification arrives, recvmsg() returns the notification in the
application-supplied data buffer via msg_iov, and sets
MSG_NOTIFICATION in msg_flags.
This section details the notification structures. Every
notification structure carries some common fields which provides
general information.
A recvmsg() call will return only one notification at a time. Just
as when reading normal data, it may return part of a notification if
the msg_iov buffer is not large enough. If a single read is not
sufficient, msg_flags will have MSG_EOR clear. The user MUST finish
reading the notification before subsequent data can arrive.
5.3.1 SCTP Notification Structure
The notification structure is defined as the union of all
notification types.
Stewart et.al. [Page 18]
Internet Draft SCTP Sockets Mapping June 2001
union sctp_notification {
uint16_t sn_type; /* Notification type. */
struct sctp_assoc_change;
struct sctp_paddr_change;
struct sctp_remote_error;
struct sctp_shutdown_event;
};
sn_type: sizeof (uint16_t)
The following table describes the SCTP notification and event types
for the field sn_type.
sn_type Description
--------- ---------------------------
SCTP_ASSOC_CHANGE This tag indicates that an
association has either been
opened or closed. Refer to
5.3.1.1 for details.
SCTP_PEER_ADDR_CHANGE This tag indicates that an
address that is part of an existing
association has experienced a
change of state (e.g. a failure
or return to service of the
reachability of a endpoint
via a specific transport
address). Please see 5.3.1.2
for data structure details.
SCTP_REMOTE_ERROR The attached error message
is an Operational Error received from
the remote peer. It includes the complete
TLV sent by the remote endpoint.
See section 5.3.1.3 for the detailed format.
SCTP_SEND_FAILED The attached datagram
could not be sent to the remote endpoint.
This structure includes the
original SCTP_SNDRCVINFO
that was used in sending this
message i.e. this structure
uses the sctp_sndrecvinfo per
section 5.3.1.4.
SCTP_SHUTDOWN_EVENT The peer has sent a SHUTDOWN. No further
data should be sent on this socket.
5.3.1.1 SCTP_ASSOC_CHANGE
Communication notifications inform the ULP that an SCTP association
has either begun or ended. The identifier for the new association
Stewart et.al. [Page 19]
Internet Draft SCTP Sockets Mapping June 2001
resides in the sctp_notification structure in the cmsg_data
ancillary data. The notification information has the following
format:
struct sctp_assoc_change {
uint16_t sac_type;
uint16_t sac_flags;
uint32_t sac_length;
sctp_assoc_t sac_assoc_id;
uint16_t sac_state;
uint16_t sac_error;
uint16_t sac_outbound_streams;
uint16_t sac_inbound_streams;
};
sac_type:
It should be SCTP_ASSOC_CHANGE.
sac_flags: 16 bits (unsigned integer)
Currently unused.
sac_length: sizeof (uint32_t)
This field is the total length of the notification data, including
the notification header.
sac_assoc_id: sizeof (sctp_assoc_t)
The association id field, holds the identifier for the association.
All notifications for a given association have the same association
identifier. For TCP style socket, this field is ignored.
sac_state: 32 bits (signed integer)
This field holds one of a number of values that communicate the
event that happened to the association. They include:
Event Name Description
---------------- ---------------
COMMUNICATION_UP A new association is now ready
and data may be exchanged with this
peer.
COMMUNICATION_LOST The association has failed. The association
is now in the closed state. If SEND FAILED
notifications are turned on, a COMMUNICATION_LOST
is followed by a series of SCTP_SEND_FAILED
events, one for each outstanding message.
RESTART SCTP has detected that the peer has restarted.
SHUTDOWN_COMPLETE The association has gracefully closed.
Stewart et.al. [Page 20]
Internet Draft SCTP Sockets Mapping June 2001
CANT_START_ASSOC The association failed to setup. If non blocking
mode is set and data was sent (in the udp mode),
a CANT_START_ASSOC is followed by a series of
SCTP_SEND_FAILED events, one for each outstanding
message.
sac_error: 32 bits (signed integer)
If the state was reached due to a error condition (e.g.
COMMUNICATION_LOST) any relevant error information is available in
this field. This corresponds to the protocol error codes defined in
[SCTP].
sac_outbound_streams: 16 bits (unsigned integer)
sac_inbound_streams: 16 bits (unsigned integer)
The maximum number of streams allowed in each directtion are
available in sac_outbound_streams and sac_inbound streams.
An application must enable this notification with setsockopt (see
section 7.3) before any new associations will be accepted on a
UDP-style socket. This is the mechanism by which a server (or peer
application that wishes to accept new associations) instructs the
SCTP stack to accept new associations on a socket. Clients (i.e.
applications on which only active opens are made) can leave this
ancillary data item off; they will then be assured that the only
associations on the socket will be ones they actively initiated.
Server or peer to peer sockets, on the other hand, will always
accept new associations, so a well-written application using server
UDP-style sockets must be prepared to handle new associations from
unwanted peers.
5.3.1.2 SCTP_PEER_ADDR_CHANGE
When a destination address on a multi-homed peer encounters a change
an interface details event is sent. The information has the
following structure:
struct sctp_paddr_change{
uint16_t spc_type;
uint16_t spc_flags;
uint32_t spc_length;
sctp_assoc_t spc_assoc_id;
struct sockaddr_storage spc_aaddr;
int spc_state;
int spc_error;
}
spc_type:
It should be SCTP_PEER_ADDR_CHANGE.
spc_flags: 16 bits (unsigned integer)
Stewart et.al. [Page 21]
Internet Draft SCTP Sockets Mapping June 2001
Currently unused.
spc_length: sizeof (uint32_t)
This field is the total length of the notification data, including
the notification header.
spc_assoc_id: sizeof (sctp_assoc_t)
The association id field, holds the identifier for the association.
All notifications for a given association have the same association
identifier. For TCP style socket, this field is ignored.
spc_aaddr: sizeof (struct sockaddr_storage)
The affected address field, holds the remote peer's address that is
encountering the change of state.
spc_state: 32 bits (signed integer)
This field holds one of a number of values that communicate the
event that happened to the address. They include:
Event Name Description
---------------- ---------------
ADDRESS_AVAILABLE This address is now reachable.
ADDRESS_UNREACHABLE The address specified can no
longer be reached. Any data sent
to this address is rerouted to an
alternate until this address becomes
reachable.
ADDRESS_REMOVED The address is no longer part of
the association.
ADDRESS_ADDED The address is now part of the
association.
ADDRESS_MADE_PRIM This address has now been made
to be the primary destination address.
spc_error: 32 bits (signed integer)
If the state was reached due to any error condition (e.g.
ADDRESS_UNREACHABLE) any relevant error information is available in
this field.
5.3.1.3 SCTP_REMOTE_ERROR
A remote peer may send an Operational Error message to its peer.
This message indicates a variety of error conditions on an
association. The entire error TLV as it appears on the wire is
Stewart et.al. [Page 22]
Internet Draft SCTP Sockets Mapping June 2001
included in a SCTP_REMOTE_ERROR event. Please refer to the SCTP
specification [SCTP] and any extensions for a list of possible
error formats. SCTP error TLVs have the format:
struct sctp_remote_error {
uint16_t sre_type;
uint16_t sre_flags;
uint32_t sre_length;
sctp_assoc_t sre_assoc_id;
uint16_t sre_error;
uint16_t sre_len;
uint8_t sre_data[0];
};
sre_type:
It should be SCTP_REMOTE_ERROR.
sre_flags: 16 bits (unsigned integer)
Currently unused.
sre_length: sizeof (uint32_t)
This field is the total length of the notification data, including
the notification header.
sre_assoc_id: sizeof (sctp_assoc_t)
The association id field, holds the identifier for the association.
All notifications for a given association have the same association
identifier. For TCP style socket, this field is ignored.
sre_error: 16 bits (unsigned integer)
This value represents one of the Operational Error causes defined in
the SCTP specification, in network byte order.
sre_len: 16 bits (unsigned integer)
This value represents the length of the operational error payload in
plus the size of sre_error and sre_len in network byte order.
sre_data: variable
This contains the payload of the operational error as defined in the
SCTP specification [SCTP] section 3.3.10.
5.3.1.4 SCTP_SEND_FAILED
If SCTP cannot deliver a message it may return the message as a
notification.
struct sctp_send_failed {
Stewart et.al. [Page 23]
Internet Draft SCTP Sockets Mapping June 2001
uint16_t ssf_type;
uint16_t ssf_flags;
uint32_t ssf_length;
sctp_assoc_t ssf_assoc_id;
uint32_t ssf_error;
struct sctp_sndrcvinfo ssf_info;
uint8_t ssf_data[0];
};
ssf_type:
It should be SCTP_SEND_FAILED.
ssf_flags: 16 bits (unsigned integer)
The flag value will take one of the following values
SCTP_DATA_INQUEUE - When this flag is indicated the
data was never attempted to be
sent. I.e. it was never assigned
a TSN and sent onto the wire.
SCTP_DATA_INTMIT - When this flag is indicated the
data WAS assigned a TSN and sent
at least once but never acknowleded.
ssf_length: sizeof (uint32_t)
This field is the total length of the notification data, including
the notification header.
ssf_assoc_id: sizeof (sctp_assoc_t)
The association id field, sf_assoc_id, holds the identifier for the
association. All notifications for a given association have the
same association identifier. For TCP style socket, this field is
ignored.
ssf_error: 16 bits (unsigned integer)
This value represents the reason why the send fails.
ssf_info: sizeof (struct sctp_sndrcvinfo)
The original send information associated with the unsent message.
ssf_data: variable
The unsent message.
5.3.1.5 SCTP_SHUTDOWN_EVENT
When a peer sends a SHUTDOWN, SCTP delivers this notification to
inform the application that it should cease sending data.
Stewart et.al. [Page 24]
Internet Draft SCTP Sockets Mapping June 2001
struct sctp_shutdown_event {
uint16_t sse_type;
uint16_t sse_flags;
uint32_t sse_length;
sctp_assoc_t sse_assoc_id;
};
sse_type
It should be SCTP_SEND_FAILED.
sse_flags: 16 bits (unsigned integer)
Currently unused.
sse_length: sizeof (uint32_t)
This field is the total length of the notification data, including
the notification header.
sse_assoc_id: sizeof (sctp_assoc_t)
The association id field, holds the identifier for the association.
All notifications for a given association have the same association
identifier. For TCP style socket, this field is ignored.
5.4 Ancillary Data Considerations and Semantics
Programming with ancillary socket data contains some subtleties and
pitfalls, which are discussed below.
5.4.1 Multiple Items and Ordering
Multiple ancillary data items may be included in any call to
sendmsg() or recvmsg(); these may include multiple SCTP or non-SCTP
items, or both.
The ordering of ancillary data items (either by SCTP or another
protocol) is not significant and is implementation-dependant, so
applications must not depend on any ordering.
SCTP_SNDRCV items must always correspond to the data in the msghdr's
msg_iov member. There can be only a single SCTP_SNDRCV info for
each sendmsg() or recvmsg() call.
5.4.2 Accessing and Manipulating Ancillary Data
Applications can infer the presence of data or ancillary data by
examining the msg_iovlen and msg_controllen msghdr members,
respectively.
Implementations may have different padding requirements for
ancillary data, so portable applications should make use of the
Stewart et.al. [Page 25]
Internet Draft SCTP Sockets Mapping June 2001
macros CMSG_FIRSTHDR, CMSG_NXTHDR, CMSG_DATA, CMSG_SPACE, and
CMSG_LEN. See [RFC2292] and your SCTP implementation's documentation
for more information. Following is an example, from [RFC2292],
demonstrating the use of these macros to access ancillary data:
struct msghdr msg;
struct cmsghdr *cmsgptr;
/* fill in msg */
/* call recvmsg() */
for (cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL;
cmsgptr = CMSG_NXTHDR(&msg, cmsgptr)) {
if (cmsgptr->cmsg_level == ... && cmsgptr->cmsg_type == ... ) {
u_char *ptr;
ptr = CMSG_DATA(cmsgptr);
/* process data pointed to by ptr */
}
}
5.4.3 Control Message Buffer Sizing
The information conveyed via SCTP_SNDRCV events will often be
fundamental to the correct and sane operation of the sockets
application. This is particularly true of the UDP semantics, but
also of the TCP semantics. For example, if an application needs to
send and receive data on different SCTP streams, SCTP_SNDRCV events
are indispensable.
Given that some ancillary data is critical, and that multiple
ancillary data items may appear in any order, applications should be
carefully written to always provide a large enough buffer to contain
all possible ancillary data that can be presented by recvmsg(). If
the buffer is too small, and crucial data is truncated, it may pose
a fatal error condition.
Thus it is essential that applications be able to deterministically
calculate the maximum required buffer size to pass to recvmsg(). One
constraint imposed on this specification that makes this possible is
that all ancillary data definitions are of a fixed length. One way
to calculate the maximum required buffer size might be to take the
sum the sizes of all enabled ancillary data item structures, as
calculated by CMSG_SPACE. For example, if we enabled
SCTP_SNDRCV_INFO and IPV6_RECVPKTINFO [RFC2292], we would calculate
and allocate the buffer size as follows:
size_t total;
void *buf;
total = CMSG_SPACE(sizeof (struct sctp_sndrcvinfo)) +
CMSG_SPACE(sizeof (struct in6_pktinfo));
Stewart et.al. [Page 26]
Internet Draft SCTP Sockets Mapping June 2001
buf = malloc(total);
We could then use this buffer for msg_control on each call to
recvmsg() and be assured that we would not lose any ancillary data
to truncation.
6. Common Operations for Both Styles
6.1 send(), recv(), sendto(), recvfrom()
Applications can use send() and sendto() to transmit data to the
peer of an SCTP endpoint. recv() and recvfrom() can be used to
receive data from the peer.
The syntax is:
ssize_t send(int sd, connst void *msg, size_t len, int flags);
ssize_t sendto(int sd, const void *msg, size_t len, int flags,
const struct sockaddr *to, int tolen);
ssize_t recv(int sd, void *buf, size_t len, int flags);
ssize_t recvfrom(int sd, void *buf, size_t len, int flags,
struct sockaddr *from, int *fromlen);
sd - the socket descriptor of an SCTP endpoint.
msg - the message to be sent.
len - the size of the message or the size of buffer.
to - one of the peer addresses of the association to be
used to send the message.
tolen - the size of the address.
buf - the buffer to store a received message.
from - the buffer to store the peer address used to send the
received message.
fromlen - the size of the from address
flags - (described below).
These calls give access to only basic SCTP protocol features. If
either peer in the association uses multiple streams, or sends
unordered data these calls will usually be inadequate, and may
deliver the data in unpredictable ways.
SCTP has the concept of multiple streams in one association. The
above calls do not allow the caller to specify on which stream a
message should be sent. The system uses stream 0 as the default
stream for send() and sendto(). recv() and recvfrom() return data
from any stream, but the caller can not distinguish the different
streams. This may result in data seeming to arrive out of
order. Similarly, if a data chunk is sent unordered, recv() and
recvfrom() provide no indication.
SCTP is message based. The msg buffer above in send() and sendto()
is considered to be a single message. This means that if the caller
wants to send a message which is composed by several buffers, the
caller needs to combine them before calling send() or sendto().
Alternately, the caller can use sendmsg() to do that without
Stewart et.al. [Page 27]
Internet Draft SCTP Sockets Mapping June 2001
combining them. recv() and recvfrom() cannot distinguish message
boundries.
In receiving, if the buffer supplied is not large enough to hold a
complete messaage, the receive call acts like a stream socket and
returns as much data as will fit in the buffer.
Note, the send and recv calls, when used in the UDP-style model, may
only be used with "peeled off" or high bandwidth socket descriptors
(see Section 8.2).
6.2 setsockopt(), getsockopt()
Applications use setsockopt() and getsockopt() to set or retrieve
socket options. Socket options are used to change the default
behavior of sockets calls. They are described in Section 7.
The syntax is:
ret = getsockopt(int sd, int level, int optname, void *optval,
size_t *optlen);
ret = setsockopt(int sd, int level, int optname, const void *optval,
size_t optlen);
sd - the socket descript.
level - set to IPPROTO_SCTP for all SCTP options.
optname - the option name.
optval - the buffer to store the value of the option.
optlen - the size of the buffer (or the length of the option
returned).
6.3 read() and write()
Applications can use read() and write() to send and receive data to
and from peer. They have the same semantics as send() and recv()
except that the flags parameter cannot be used.
Note, these calls, when used in the UDP-style model, may only be
used with high bandwidth socket descriptors (see Section 8.2).
7. Socket Options
The following sub-section describes various SCTP level socket
options that are common to both models. SCTP associations can be
multihomed. Therefore, certain option parameters include a
sockaddr_storage structure to select which peer address the option
should be applied to.
For the datagram model, an sctp_assoc_t structure (association ID)
is used to identify the the association instance that the operation
affects. So it must be set when using this model.
For the connnection oriented model and high bandwidth datagram
sockets (see section 8.2) this association ID parameter is ignored.
Stewart et.al. [Page 28]
Internet Draft SCTP Sockets Mapping June 2001
In the cases noted below where the parameter is ignored, an
application can pass to the system a corresponding option structure
similar to those described below but without the association ID
parameter, which should be the last field of the option structure.
This can make the option setting/getting operation more efficient.
If an application does this, it should also specify an appropriate
optlen value (i.e. sizeof (option parameter) - sizeof (struct
sctp_assoc_t)).
Note that socket or IP level options is set or retrieved per socket.
This means that for datagram model, those options will be applied to
all associations belonging to the socket. And for TCP-style model,
those options will be applied to all peer addresses of the
association controlled by the socket. Applications should be very
careful in setting those options.
7.1 Read / Write Options
7.1.1 Retransmission Timeout Parameters (SCTP_RTOINFO)
The protocol parameters used to initialize and bound retransmission
timeout (RTO) are tunable. See [SCTP] for more information on how
these parameters are used in RTO calculation. The peer address
parameter is ignored for TCP style socket.
The following structure is used to access and modify these
parameters:
struct sctp_rtoinfo {
uint32_t srto_initial;
uint32_t srto_max;
uint32_t srto_min;
sctp_assoc_t srto_assoc_id;
};
srto_initial - This contains the initial RTO value.
srto_max and srto_min - These contain the maximum and minumum bounds
for all RTOs.
srto_assoc_id - (UDP style socket) This is filled in the application,
and identifies the association for this query.
All parameters are time values, in milliseconds. A value of 0, when
modifying the parameters, indicates that the current value should
not be changed.
To access or modify these parameters, the application should call
getsockopt or setsockopt() respectively with the option name
SCTP_RTOINFO.
7.1.2 Association Retransmission Parameter (SCTP_ASSOCRTXINFO)
The protocol parameter used to set the number of retransmissions
sent before an association is considered unreachable.
See [SCTP] for more information on how this parameter is used. The
Stewart et.al. [Page 29]
Internet Draft SCTP Sockets Mapping June 2001
peer address parameter is ignored for TCP style socket.
The following structure is used to access and modify this
parameters:
struct sctp_assocparams {
uint16_t sasoc_asocmaxrxt;
sctp_assoc_t sasoc_assoc_id;
};
sasoc_asocmaxrxt - This contains the maximum retransmission attempts
to make for the association.
sasoc_assoc_id - (UDP style socket) This is filled in the application,
and identifies the association for this query.
To access or modify these parameters, the application should call
gesockopt or setsockopt() respectively with the option name
SCTP_ASSOCRTXINFO.
The maximum number of retransmissions before an address is
considered unreachable is also tunable, but is address-specific, so
it is covered in a seperate option. If an application attempts to
set the value of the association maximum retransmission parameter to
more than the sum of all maximum retransmission parameters,
setsockopt() shall return an error. The reason for this, from
[SCTP] section 8.2:
Note: When configuring the SCTP endpoint, the user should avoid
having the value of 'Association.Max.Retrans' larger than the
summation of the 'Path.Max.Retrans' of all the destination addresses
for the remote endpoint. Otherwise, all the destination addresses
may become inactive while the endpoint still considers the peer
endpoint reachable.
7.1.3 Initialization Parameters (SCTP_INITMSG)
Applications can specify protocol parameters for the default
association intialization. The structure used to access and modify
these parameters is defined in section 5.2.1. The option name
argument to setsockopt() and getsockopt() is SCTP_INITMSG.
Setting initialization parameters is effective only on an
unconnected socket (for the datagram model only future associations
are effected by the change). This option is inherited by sockets
derived from a listener socket.
7.1.4 SO_LINGER
An application using the TCP-style socket can use this option to
perform the SCTP ABORT primitive. The linger option structure is:
struct linger {
int l_onoff; /* option on/off */
int l_linger; /* linger time */
Stewart et.al. [Page 30]
Internet Draft SCTP Sockets Mapping June 2001
};
To enable the option, set l_onoff to 1. If the l_linger value is
set to 0, calling close() is the same as the ABORT primitive. If
the value is set to a negative value, the setsockopt() call will
return an error. If the value is set to a positive value
linger_time, the close() can be blocked for at most linger_time ms.
If the graceful shutdown phase does not finish during this period,
close() will return but the graceful shutdown phase continues in the
system.
7.1.5 SCTP_NODELAY
Turn off any Nagle-like algorithm. This means that packets are
generally sent as soon as possible and no unnecessary delays are
introduced, at the cost of more packets in the network. Expects an
integer boolean flag.
7.1.6 SO_RCVBUF
Sets receive buffer size. For SCTP TCP-style sockets, this controls
the receiver window size. For UDP-style sockets, this controls the
receiver window size for all associations bound to the socket
descriptor used in the setsockopt() or getsockopt() call. The option
applies to each association's window size seperately. Expects an
integer boolean flag.
7.1.7 SO_SNDBUF
Sets send buffer size. For SCTP TCP-style sockets, this controls the
amount of data SCTP may have waiting in internal buffers to be
sent. This option therefore bounds the maximum size of data that can
be sent in a single send call. For UDP-style sockets, the effect is
the same, except that it applies to all associations bound to the
socket descriptor used in the setsockopt() or getsockopt() call. The
option applies to each association's window size seperately. Expects
an integer boolean flag.
7.1.8 Automatic Close of associations (SCTP_AUTOCLOSE)
This socket option is applicable to the UDP-style socket only. When
set it will cause associations that are idle for more than the
specified number of seconds to automatically close. An association
being idle is defined an association that has NOT sent or recieved
user data. The special value of '0' indicates that no automatic
close of any associations should be performed. The option expects
an integer defining the number of seconds of idle time before
an associatin is closed.
7.2 Read-Only Options
7.2.1 Association Status (SCTP_STATUS)
Applications can retrieve current status information about an
Stewart et.al. [Page 31]
Internet Draft SCTP Sockets Mapping June 2001
association, including association state, peer receiver window size,
number of unacked data chunks, and number of data chunks pending
receipt. This information is read-only. The following structure is
used to access this information:
struct sctp_status {
int32_t sstat_state;
uint32_t sstat_rwnd;
uint16_t sstat_unackdata;
uint16_t sstat_penddata;
struct sctp_paddrinfo sstat_primary;
sctp_assoc_t sstat_assoc_id;
};
sstat_state - This contains the association's current state one
of the following values:
SCTP_CLOSED
SCTP_BOUND
SCTP_LISTEN
SCTP_COOKIE_WAIT
SCTP_COOKIE_ECHOED
SCTP_ESTABLISHED
SCTP_SHUTDOWN_PENDING
SCTP_SHUTDOWN_SENT
SCTP_SHUTDOWN_RECEIVED
SCTP_SHUTDOWN_ACK_SENT
sstat_rwnd - This contains the association peer's current
receiver window size.
sstat_unackdata - This is the number of unacked data chunks.
sstat_penddata - This is the number of data chunks pending receipt.
sstat_primary - This is information on the current primary peer
address.
sstat_assoc_id - (UDP style socket) This holds the an identifier for the
association. All notifications for a given association
have the same association identifier.
To access these status values, the application calls getsockopt()
with the option name SCTP_STATUS. The sstat_assoc_id parameter is
ignored for TCP style socket.
7.3. Ancillary Data Interest Options
Applications can receive notifications of certain SCTP events and
per-message information as ancillary data with recvmsg().
The following optional information is available to the application:
1. SCTP_RECVDATAIOEVNT: Per-message information (i.e. stream number,
TSN, SSN, etc. described in section 5.2.2)
2. SCTP_RECVASSOCEVNT: (described in section 5.3.1.1)
3. SCTP_RECVPADDREVNT: (described in section 5.3.1.2)
Stewart et.al. [Page 32]
Internet Draft SCTP Sockets Mapping June 2001
4. SCTP_RECVPEERERR: (described in section 5.3.1.3)
5. SCTP_RECVSENDFAILEVNT: (described in section 5.3.1.4)
6. SCTP_RECVSHUTDOWNEVNT: (described in section 5.3.1.5);
To receive any ancillary data, first the application registers it's
interest by calling setsockopt() to turn on the corresponding flag:
int on = 1;
setsockopt(fd, IPPROTO_SCTP, SCTP_RECVDATAIOEVNT, &on, sizeof(on));
setsockopt(fd, IPPROTO_SCTP, SCTP_RECVPADDREVNT, &on, sizeof(on));
setsockopt(fd, IPPROTO_SCTP, SCTP_RECVSENDFAILEVNT, &on, sizeof(on));
setsockopt(fd, IPPROTO_SCTP, SCTP_RECVPEERERR, &on, sizeof(on));
setsockopt(fd, IPPROTO_SCTP, SCTP_RECVSHUTDOWNEVNT, &on, sizeof(on));
Note that for UDP-style SCTP sockets, the caller of recvmsg()
receives ancillary data for ALL associations bound to the file
descriptor. For TCP-style SCTP sockets, the caller receives
ancillary data for only the single association bound to the file
descriptor.
By default a TCP-style socket has all options off.
By default a UDP-style socket has SCTP_REVCVDATAIOEVENT on and all
other options off.
The format of the data structures for each ancillary data item is
given in section 5.2.
8. New Interfaces
Depending on the system, the following interface can be implemented
as a system call or library funtion.
8.1 sctp_bindx()
The syntax of sctp_bindx() is,
int sctp_bindx(int sd, struct sockaddr_storage *addrs, int addrcnt,
int flags);
If sd is an IPv4 socket, the addresses passed must be IPv4
addresses. If the sd is an IPv6 socket, the addresses passed can
either be IPv4 or IPv6 addresses.
A single address may be specified as INADDR_ANY or IN6ADDR_ANY, see
section 3.1.2 for this usage.
addrs is a pointer to an array of one or more socket addresses.
Each address is contained in a struct sockaddr_storage, so each
address is a fixed length. The caller specifies the number of
addresses in the array with addrcnt.
On success, sctp_bindx() returns 0. On failure, sctp_bindx() returns
Stewart et.al. [Page 33]
Internet Draft SCTP Sockets Mapping June 2001
-1, and sets errno to the appropriate error code.
For SCTP, the port given in each socket address must be the same, or
sctp_bindx() will fail, setting errno to EINVAL.
The flags parameter is formed from the bitwise OR of zero or more of
the following currently defined flags:
SCTP_BINDX_ADD_ADDR
SCTP_BINDX_REM_ADDR
SCTP_BIND_ADD_ADDR directs SCTP to add the given addresses to the
association, and SCTP_BIND_REM_ADDR directs SCTP to remove the given
addresses from the association. The two flags are mutually
exclusive; if both are given, sctp_bindx() will fail with EINVAL. A
caller may not remove all addresses from an association;
sctp_bindx() will reject such an attempt with EINVAL.
An application can use sctp_bindx(SCTP_BINDX_ADD_ADDR) to associate
additional addresses with an endpoint after calling bind(). Or use
sctp_bindx(SCTP_BINDX_REM_ADDR) to remove some addresses a listening
socket is associated with so that no new association accepted will
be associated with those addresses.
Adding and removing addresses from a connected association is
optional functionality. Implementations that do not support this
functionality should return EOPNOTSUPP.
8.2 Branched-off Association
After an association is established on a UDP-style socket, the
application may wish to branch off the association into a separate
socket/file descriptor.
This is particularly desirable when, for instance, the application
wishes to have a number of sporadic message senders/receivers remain
under the original UDP-style socket but branch off those
associations carrying high volume data traffic into their own
separate socket descriptors.
The application uses sctp_peeloff() call to branch off an
association into a separate socket (Note the semantics are somewhat
changed from the traditional TCP-style accept() call).
The syntax is:
new_sd = sctp_peeloff(int sd, sctp_assoc_t *assoc_id, int *addrlen)
new_sd - the new socket descriptor representing the branched-off
association.
sd - the original UDP-style socket descriptor returned from the
socket() system call (see Section 3.1.1).
Stewart et.al. [Page 34]
Internet Draft SCTP Sockets Mapping June 2001
assoc_id - the specified identifier of the association that is to be
branched off to a separate file descriptor (Note, in a
traditional TCP-style accept() call, this would be an out
parameter, but for the UDP-style call, this is an in
parameter).
addrlen - an integer pointer to the size of the sockaddr structure
addr (in a traditional TCP-style call, this would be a out
parameter, but for the UDP-style call this is an in
parameter).
8.3 sctp_getpaddrs()
sctp_getpaddrs() returns all peer addresses in an association. The
syntax is,
int sctp_getpaddrs(int sd, sctp_assoc_t id,
struct sockaddr_storage **addrs);
On return, addrs will point to a dynamically allocated array of
struct sockaddr_storages, one for each peer address. The caller
should use sctp_freepaddrs() to free the memory. addrs must not be
NULL.
If sd is an IPv4 socket, the addresses returned will be all IPv4
addresses. If sd is an IPv6 socket, the addresses returned can be a
mix of IPv4 or IPv6 addresses.
For UDP-style sockets, id specifies the association to query. For
TCP-style sockets, id is ignored.
On success, sctp_getpaddrs() returns the number of peer addresses in
the association. If there is no association on this socket,
sctp_getpaddrs() returns 0, and the value of *addrs is undefined. If
an error occurs, sctp_getpaddrs() returns -1, and the value of
*addrs is undefined.
8.4 sctp_freepaddrs()
sctp_freepaddrs() frees all resources allocated by
sctp_getpaddrs(). Its syntax is,
void sctp_freepaddrs(struct sockaddr_storage *addrs);
addrs is the array of peer addresses returned by sctp_getpaddrs.
8.5 sctp_opt_info()
getsockopt() is read-only, so a new interface is required when
information must be passed both in to and out of the SCTP stack. The
syntax for scpt_opt_info() is,
int sctp_opt_info(int sd, sctp_assoc_t id, int opt, void *arg);
Stewart et.al. [Page 35]
Internet Draft SCTP Sockets Mapping June 2001
For UDP-style sockets, id specifies the association to query. For
TCP-style sockets, id is ignored.
opt specifies which SCTP option to get or set. It can be one of the
following:
SCTP_SET_PRIMARY_ADDRS
SCTP_SET_PEER_PRIMARY_ADDRS
SCTP_SET_PEER_ADDR_PARAMS
SCTP_GET_PEER_ADDR_PARAMS
SCTP_GET_PEER_ADDR_INFO
arg is an option-specific structure buffer provided by the caller.
See 8.5 subsections for more information on these options and
option-specific structures.
sctp_opt_info() returns 0 on success, or on failure returns -1 and
sets errno to the appropriate error code.
8.5.1 Peer Address Parameters
Applications can enable or disable heartbeats for any peer address
of an association, modify an address's heartbeat interval, force a
heartbeat to be sent immediately, and adjust the address's maximum
number of retransmissions sent before an address is considered
unreachable. An application may also set what it deems as the
primary address as well as communicate to the remote peer what
address the local application would like the remote peer to use
as its primary address (when sending to the local endpoint).
The following structure is used to access and modify an address's
parameters:
struct sctp_paddrparams {
struct sockaddr_storage spp_address;
uint32_t spp_hbinterval;
uint16_t spp_pathmaxrxt;
sctp_assoc_t spp_assoc_id;
};
spp_address - This specifies which address is of interest.
spp_hbinterval - This contains the value of the heartbeat interval,
in milliseconds. A value of 0, when modifying the
parameter, specifies that the heartbeat on this
address should be disabled. A value of UINT32_MAX
(4294967295), when modifying the parameter,
specifies that a heartbeat should be sent
immediately to the peer address, and the current
interval should remain unchanged.
spp_pathmaxrxt - This contains the maximum number of
retransmissions before this address shall be
considered unreachable.
spp_assoc_id - (UDP style socket) This is filled in the application,
and identifies the association for this query.
Stewart et.al. [Page 36]
Internet Draft SCTP Sockets Mapping June 2001
To modify these parameters, the application should call
sctp_opt_info() with the SCTP_SET_PEER_ADDR_PARAMS option. To get
these parameters, the application should use
SCTP_GET_PEER_ADDR_PARAMS.
8.5.2 Peer Address Information
Applications can retrieve information about a specific peer address
of an association, including its reachability state, congestion
window, and retransmission timer values. This information is
read-only. The following structure is used to access this
information:
struct sctp_paddrinfo {
struct sockaddr_storage spinfo_address;
int32_t spinfo_state;
uint32_t spinfo_cwnd;
uint32_t spinfo_srtt;
uint32_t spinfo_rto;
sctp_assoc_t spinfo_assoc_id;
};
spinfo_address - This is filled in the application, and contains
the peer address of interest.
On return from getsockopt():
spinfo_state - This contains the peer addresses's state (either
SCTP_ACTIVE or SCTP_INACTIVE).
spinfo_cwnd - This contains the peer addresses's current congestion
window.
spinfo_srtt - This contains the peer addresses's current smoothed
round-trip time calculation in milliseconds.
spinfo_rto - This contains the peer addresses's current
retransmission timeout value in milliseconds.
spinfo_assoc_id - (UDP style socket) This is filled in the application,
and identifies the association for this query.
To retrieve this information, use sctp_opt_info() with the
SCTP_GET_PEER_ADDR_INFO options.
9. Security Considerations
Many TCP and UDP implementations reserve port numbers below 1024 for
privileged users. If the target platform supports privileged users,
the SCTP implementation SHOULD restrict the ability to call bind()
or sctp_bindx() on these port numbers to privileged users.
Similarly unprivelged users should not be able to set protocol
parameters which could result in the congestion control algorithm
being more agressive than permitted on the public Internet. These
paramaters are:
Stewart et.al. [Page 37]
Internet Draft SCTP Sockets Mapping June 2001
struct sctp_rtoinfo
If an unprivileged user inherits a datagram model socket with open
associations on a privileged port, it MAY be permitted to accept new
associations, but it SHOULD NOT be permitted to open new
associations. This could be relevant for the r* family of
protocols.
10. Authors' Addresses
Randall R. Stewart Tel: +1-815-477-2127
Cisco Systems, Inc. EMail: rrs@cisco.com
Crystal Lake, IL 60012
USA
Qiaobing Xie Tel: +1-847-632-3028
Motorola, Inc. EMail: qxie1@email.mot.com
1501 W. Shure Drive, Room 2309
Arlington Heights, IL 60004
USA
La Monte H.P. Yarroll NIC Handle: LY
Motorola, Inc. EMail: piggy@acm.org
1501 W. Shure Drive, IL27-2315
Arlington Heights, IL 60004
USA
Jonathan Wood
Sun Microsystems, Inc. Email: jonathan.wood@sun.com
901 San Antonio Road
Palo Alto, CA 94303
USA
Kacheong Poon
Sun Microsystems, Inc. Email: kacheong.poon@sun.com
901 San Antonio Road
Palo Alto, CA 94303
USA
Ken Fujita Tel: +81-471-82-1131
NEC Corporation Email: fken@cd.jp.nec.com
1131, Hinode, Abiko
Chiba, 270-1198
Japan
11. References
[RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions
Functional Specification," RFC 1644, July 1994.
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3",
RFC 2026, October 1996.
[RFC2292] W.R. Stevens, M. Thomas, "Advanced Sockets API for IPv6",
Stewart et.al. [Page 38]
Internet Draft SCTP Sockets Mapping June 2001
RFC 2292, February 1998.
[RFC2553] R. Gilligan, S. Thomson, J. Bound, W. Stevens. "Basic Socket
Interface Extensions for IPv6," RFC 2553, March 1999.
[SCTP] R.R. Stewart, Q. Xie, K. Morneault, C. Sharp, H.J. Schwarzbauer,
T. Taylor, I. Rytina, M. Kalla, L. Zhang, and, V. Paxson,
"Stream Control Transmission Protocol," RFC2960, October 2000.
[STEVENS] W.R. Stevens, M. Thomas, E. Nordmark, "Advanced Sockets API for
IPv6," <draft-ietf-ipngwg-rfc2292bis-01.txt>, December 1999
(Work in progress)
Appendix A: TCP-style Code Example
The following code is a simple implementation of an echo server over
SCTP. The example shows how to use some features of TCP-style IPv4
SCTP sockets, including:
o Opening, binding, and listening for new associations on a socket;
o Enabling ancillary data
o Enabling notifications
o Using ancillary data with sendmsg() and recvmsg()
o Using MSG_EOR to determine if an entire message has been read
o Handling notifications
static void
handle_event(void *buf)
{
struct sctp_assoc_change *sac;
struct sctp_send_failed *ssf;
struct sctp_paddr_change *spc;
struct sctp_remote_error *sre;
union sctp_notification *snp;
char addrbuf[INET6_ADDRSTRLEN];
const char *ap;
struct sockaddr_in *sin;
struct sockaddr_in6 *sin6;
snp = buf;
switch (snp->sn_type) {
case SCTP_ASSOC_CHANGE:
sac = &snp->sn_assoc_change;
printf("^^^ assoc_change: state=%hu, error=%hu, instr=%hu "
"outstr=%hu\n", sac->sac_state, sac->sac_error,
sac->sac_inbound_streams, sac->sac_outbound_streams);
break;
case SCTP_SEND_FAILED:
ssf = &snp->sn_send_failed;
printf("^^^ sendfailed: len=%hu err=%d\n", ssf->ssf_length,
Stewart et.al. [Page 39]
Internet Draft SCTP Sockets Mapping June 2001
ssf->ssf_error);
break;
case SCTP_PEER_ADDR_CHANGE:
spc = &snp->sn_intf_change;
if (spc->spc_addr.ss_family == AF_INET) {
sin = (struct sockaddr_in *)&spc->spc_addr;
ap = inet_ntop(AF_INET, &sin->sin_addr, addrbuf,
INET6_ADDRSTRLEN);
} else {
sin6 = (struct sockaddr_in6 *)&spc->spc_addr;
ap = inet_ntop(AF_INET6, &sin6->sin6_addr, addrbuf,
INET6_ADDRSTRLEN);
}
printf("^^^ intf_change: %s state=%d, error=%d\n", ap,
spc->spc_state, spc->spc_error);
break;
case SCTP_REMOTE_ERROR:
sre = &snp->sn_remote_error;
printf("^^^ remote_error: err=%hu len=%hu\n",
ntohs(sre->sre_error), ntohs(sre->sre_len));
break;
case SCTP_SHUTDOWN_EVENT:
printf("^^^ shutdown event\n");
break;
default:
printf("unknown type: %hu\n", snp->sn_type);
break;
}
}
static void *
sctp_recvmsg(int fd, struct msghdr *msg, void *buf, size_t *buflen,
ssize_t *nrp, size_t cmsglen)
{
ssize_t nr = 0;
struct iovec iov[1];
*nrp = 0;
iov->iov_base = buf;
msg->msg_iov = iov;
msg->msg_iovlen = 1;
for (;;) {
msg->msg_flags = MSG_XPG4_2;
msg->msg_iov->iov_len = *buflen;
msg->msg_controllen = cmsglen;
nr += recvmsg(fd, msg, 0);
if (nr <= 0) {
/* EOF or error */
*nrp = nr;
return (NULL);
}
Stewart et.al. [Page 40]
Internet Draft SCTP Sockets Mapping June 2001
if ((msg->msg_flags & MSG_EOR) != 0) {
*nrp = nr;
return (buf);
}
/* Realloc the buffer? */
if (*buflen == nr) {
buf = realloc(buf, *buflen * 2);
if (buf == 0) {
fprintf(stderr, "out of memory\n");
exit(1);
}
*buflen *= 2;
}
/* Set the next read offset */
iov->iov_base = (char *)buf + nr;
iov->iov_len = *buflen - nr;
}
}
static void
echo(int fd, int socketModeUDP)
{
ssize_t nr;
struct sctp_sndrcvinfo *sri;
struct msghdr msg[1];
struct cmsghdr *cmsg;
char cbuf[sizeof (*cmsg) + sizeof (*sri)];
char *buf;
size_t buflen;
struct iovec iov[1];
size_t cmsglen = sizeof (*cmsg) + sizeof (*sri);
/* Allocate the initial data buffer */
buflen = BUFLEN;
if (!(buf = malloc(BUFLEN))) {
fprintf(stderr, "out of memory\n");
exit(1);
}
/* Set up the msghdr structure for receiving */
memset(msg, 0, sizeof (*msg));
msg->msg_control = cbuf;
msg->msg_controllen = cmsglen;
msg->msg_flags = 0;
cmsg = (struct cmsghdr *)cbuf;
sri = (struct sctp_sndrcvinfo *)(cmsg + 1);
/* Wait for something to echo */
while (buf = sctp_recvmsg(fd, msg, buf, &buflen, &nr, cmsglen)) {
/* Intercept notifications here */
Stewart et.al. [Page 41]
Internet Draft SCTP Sockets Mapping June 2001
if (msg->msg_flags & MSG_NOTIFICATION) {
handle_event(buf);
continue;
}
iov->iov_base = buf;
iov->iov_len = nr;
msg->msg_iov = iov;
msg->msg_iovlen = 1;
printf("got %u bytes on stream %hu:\n", nr,
sri->sinfo_stream);
write(0, buf, nr);
/* Echo it back */
msg->msg_flags = MSG_XPG4_2;
if (sendmsg(fd, msg, 0) < 0) {
perror("sendmsg");
exit(1);
}
}
if (nr < 0) {
perror("recvmsg");
}
if(socketModeUDP == 0)
close(fd);
}
main()
{
int lfd, cfd;
int onoff = 1;
struct sockaddr_in sin[1];
if ((lfd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP)) == -1) {
perror("socket");
exit(1);
}
sin->sin_family = AF_INET;
sin->sin_port = htons(7);
sin->sin_addr.s_addr = INADDR_ANY;
if (bind(lfd, (struct sockaddr *)sin, sizeof (*sin)) == -1) {
perror("bind");
exit(1);
}
if (listen(lfd, 1) == -1) {
perror("listen");
exit(1);
}
/* Wait for new associations */
Stewart et.al. [Page 42]
Internet Draft SCTP Sockets Mapping June 2001
for (;;) {
if ((cfd = accept(lfd, NULL, 0)) == -1) {
perror("accept");
exit(1);
}
/* Enable ancillary data */
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVDATAIOEVNT,
&onoff, 4) < 0) {
perror("setsockopt RECVDATAIOEVNT");
exit(1);
}
/* Enable notifications */
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVASSOCEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVASSOCEVNT");
exit(1);
}
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVSENDFAILEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVASSOCEVNT");
exit(1);
}
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVPADDREVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVPADDREVNT");
exit(1);
}
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVDATAIOEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVDATAIOEVNT");
exit(1);
}
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVPEERERR,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVPEERERR");
exit(1);
}
if (setsockopt(cfd, IPPROTO_SCTP, SCTP_RECVSHUTDOWNEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVSHUTDOWNEVNT");
exit(1);
}
/* Echo back any and all data */
echo(cfd,0);
}
}
Appendix B: UDP-style Code Example
The following code is a simple implementation of an echo server over
SCTP. The example shows how to use some features of UDP-style IPv4
SCTP sockets, including:
Stewart et.al. [Page 43]
Internet Draft SCTP Sockets Mapping June 2001
o Opening and binding of a socket;
o Enabling ancillary data
o Enabling notifications
o Using ancillary data with sendmsg() and recvmsg()
o Using MSG_EOR to determine if an entire message has been read
o Handling notifications
Note most functions defined in Appendix A are reused in
this example.
main()
{
int fd;
int onoff = 1;
int idleTime = 2;
struct sockaddr_in sin[1];
if ((fd = socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP)) == -1) {
perror("socket");
exit(1);
}
sin->sin_family = AF_INET;
sin->sin_port = htons(7);
sin->sin_addr.s_addr = INADDR_ANY;
if (bind(fd, (struct sockaddr *)sin, sizeof (*sin)) == -1) {
perror("bind");
exit(1);
}
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVDATAIOEVNT,
&onoff, 4) < 0) {
perror("setsockopt RECVDATAIOEVNT");
exit(1);
}
/* Enable notifications */
/* This will get us new associations as well */
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVASSOCEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVASSOCEVNT");
exit(1);
}
/* if a send fails we want to know it */
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVSENDFAILEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVASSOCEVNT");
exit(1);
}
/* if a network address change or event transpires
* we wish to know it
Stewart et.al. [Page 44]
Internet Draft SCTP Sockets Mapping June 2001
*/
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVPADDREVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVPADDREVNT");
exit(1);
}
/* We would like all io events */
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVDATAIOEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVDATAIOEVNT");
exit(1);
}
/* We would like all error TLV's from the peer */
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVPEERERR,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVPEERERR");
exit(1);
}
/* And of course we would like to know about shutdown's */
if (setsockopt(fd, IPPROTO_SCTP, SCTP_RECVSHUTDOWNEVNT,
&onoff, 4) < 0) {
perror("setsockopt SCTP_RECVSHUTDOWNEVNT");
exit(1);
}
/* Set associations to auto-close in 2 seconds of
* inactivity
*/
if (setsockopt(fd, IPPROTO_SCTP, SCTP_AUTOCLOSE,
&idleTime, 4) < 0) {
perror("setsockopt SCTP_AUTOCLOSE");
exit(1);
}
/* Wait for new associations */
while(1){
/* Echo back any and all data */
echo(fd,1);
}
}
Stewart et.al. [Page 45]