INTERNET-DRAFT                                        W. Richard Stevens
Expires: April 19, 1997                            Matt Thomas (Digital)
                                                        October 19, 1996


                     Advanced Sockets API for IPv6
                  <draft-stevens-advanced-api-00.txt>



Abstract

    Specifications are in progress for changes to the sockets API to
    support IP version 6 [1].  These changes are for TCP and UDP-based
    applications and will support most end-user applications in use
    today: Telnet and FTP clients and servers, HTTP clients and servers,
    and the like.

    But another class of applications exists that will also be run under
    IPv6.  We call these "advanced" applications and today this includes
    programs such as Ping, Traceroute, routing daemons, multicast
    routing daemons, router discovery daemons, and the like.  The API
    feature typically used by these programs that make them "advanced"
    is a raw socket to access ICMPv4, IGMPv4, or IPv4, along with some
    knowledge of the packet header formats used by these protocols.  To
    provide portability for applications that use raw sockets under
    IPv6, some standardization is needed for the advanced API features.

    There are other features of IPv6 that some applications will need to
    access: interface identification (specifying the outgoing interface
    and determining the incoming interface) and IPv6 extension headers
    that are not addressed in [1]: hop-by-hop options, destination
    options, and the routing header (source routing).

Status of this Memo

    This document is an Internet Draft.  Internet Drafts are working
    documents of the Internet Engineering Task Force (IETF), its Areas,
    and its Working Groups.  Note that other groups may also distribute
    working documents as Internet Drafts.

    Internet Drafts are draft documents valid for a maximum of six
    months.  Internet Drafts may be updated, replaced, or obsoleted by
    other documents at any time.  It is not appropriate to use Internet
    Drafts as reference material or to cite them other than as a
    "working draft" or "work in progress".

    To learn the current status of any Internet-Draft, please check the



Stevens & Thomas                                                [Page 1]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    "1id-abstracts.txt" listing contained in the internet-drafts Shadow
    Directories on: ftp.is.co.za (Africa), nic.nordu.net (Europe),
    ds.internic.net (US East Coast), ftp.isi.edu (US West Coast), and
    munnari.oz.au (Pacific Rim).

Table of Contents

     1.  Introduction ....................................................  3
     2.  Common Structures and Definitions ...............................  4
        2.1.  The ip6hdr Structure .......................................  4
             2.1.1.  IPv6 Next Header Values .............................  5
        2.2.  The icmp6hdr Structure .....................................  5
             2.2.1.  ICMPv6 Type and Code Values .........................  6
             2.2.2.  ICMPv6 Neighbor Discovery Type and Code Values ......  7
        2.3.  Address Testing Macros .....................................  9
     3.  IPv6 Raw Sockets ................................................  9
        3.1.  Checksums .................................................. 10
        3.2.  ICMPv6 Type Filtering ...................................... 10
     4.  Ancillary Data .................................................. 12
        4.1.  The msghdr Structure ....................................... 13
        4.2.  The cmsghdr Structure ...................................... 14
             4.2.1.  CMSG_FIRSTHDR ....................................... 15
             4.2.2.  CMSG_NXTHDR ......................................... 15
             4.2.3.  CMSG_DATA ........................................... 16
             4.2.4.  CMSG_SPACE .......................................... 16
             4.2.5.  CMSG_LENGTH ......................................... 16
        4.3.  Summary of Options Described Using Ancillary Data .......... 16
        4.4.  TCP Access to Ancillary Data ............................... 17
     5.  Interface Identification ........................................ 18
        5.1.  Obtaining the Interface Index .............................. 19
        5.2.  The ifreq Structure ........................................ 19
        5.3.  Returning Received Interface and Destination IPv6 Address .. 20
        5.4.  Specifying Outgoing Interface and Source IPv6 Address ...... 21
             5.4.1.  Additional Errors with sendmsg() .................... 21
     6.  Hop-By-Hop Options .............................................. 22
        6.1.  Receiving Hop-by-Hop Options ............................... 22
        6.2.  Sending Hop-by-Hop Options ................................. 22
     7.  Destination Options ............................................. 23
        7.1.  Receiving Destination Options .............................. 23
        7.2.  Sending Destination Options ................................ 23
     8.  Source Route Option ............................................. 24
        8.1.  inet6_srcrt_space .......................................... 25
        8.2.  inet6_srcrt_init ........................................... 25
        8.3.  inet6_srcrt_add ............................................ 25
        8.4.  inet6_srcrt_reverse ........................................ 26
        8.5.  inet6_srcrt_segments ....................................... 26
        8.6.  inet6_srcrt_getaddr ........................................ 26
        8.7.  inet6_srcrt_getflags ....................................... 27



Stevens & Thomas                                                [Page 2]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


     9.  Ordering of Ancillary Data and IPv6 Extension Headers ........... 27
    10.  Additional Items ................................................ 28
        10.1.  Path MTU Discovery and UDP ................................ 29
        10.2.  Neighbor Reachability and UDP ............................. 29
        10.3.  Reading the Routing Table ................................. 29
        10.4.  Obtaining Interface and Address Information ............... 29
    11.  References ...................................................... 30
    12.  Acknowledgments ................................................. 30
    13.  Authors' Addresses .............................................. 31


1.  Introduction

    Specifications are in progress for changes to the sockets API to
    support IP version 6 [2].  These changes are for TCP and UDP-based
    applications.  The current document defines some the "advanced"
    features of the sockets API that are required for applications to
    take advantage of additional features of IPv6.

    Today, the portability of applications using IPv4 raw sockets is
    quite high, but this is mainly because most IPv4 implementations
    started from a common base (the Berkeley source code) or at least
    started with the Berkeley headers.  This allows programs such as
    Ping and Traceroute, for example, to compile with minimal effort on
    many hosts that support the sockets API.  With IPv6, however, there
    is no common source code base that implementors are starting from,
    and the possibility for divergence at this level between different
    implementations is high.  To avoid a complete lack of portability
    amongst applications that use raw IPv6 sockets, some standardization
    is necessary.

    There are also features from the basic IPv6 specification that are
    not addressed in [2]: sending and receiving hop-by-hop options,
    destination options, and routing headers, specifying the outgoing
    interface, and being told of the receiving interface.

    This document can be divided into the following main sections.

    1.  Definitions of the basic constants and structures required for
        applications to use raw IPv6 sockets.  This includes structure
        definitions for the IPv6 and ICMPv6 headers and all associated
        constants (e.g., values for the next header field).

    2.  Some basic semantic definitions for IPv6 raw sockets.  For exam-
        ple, a raw ICMPv4 socket requires the application to calculate
        and store the ICMPv4 header checksum.  But with IPv6 this would
        require the application to choose the source IPv6 address
        because the source address is part of the pseudo header that



Stevens & Thomas                                                [Page 3]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        ICMPv6 now uses for its checksum computation.  It should be
        defined that with a raw ICMPv6 socket the kernel always calcu-
        lates and stores the ICMPv6 header checksum.

    3.  Interface identification: how applications specify the outgoing
        interface and are told of the incoming interface.  There are a
        class of applications that need this capability and the tech-
        nique should be portable.

    4.  Access to the optional hop-by-hop, destination, and routing
        headers.

    The final two items (interface identification and access to the IPv6
    extension headers) are specified using the "ancillary data" fields
    that were added to the 4.3BSD Reno sockets API in 1990.  The reason
    is that these ancillary data fields are part of the Posix.1g stan-
    dard (which should be approved in 1997) and should therefore be
    adopted by most vendors.

    This document does not address application access to either the
    authentication header or the encapsulating security payload header.

    All examples in this document omit error checking in favor of
    brevity and clarity.

    Datatypes in this document follow the Posix.1g format: u_intN_t
    means an unsigned integer of exactly N bits (e.g., u_int16_t) and
    u_intNm_t means an unsigned integer of at least N bits (e.g.,
    u_int32m_t).


2.  Common Structures and Definitions

    Many advanced applications examine fields in the IPv6 header and set
    and examine fields in the various ICMPv6 headers.  Common structure
    definitions for these headers are required, along with common con-
    stant definitions for the structure members.

    When an include file is specified, that include file is allowed to
    include other files that do the actual declaration or definition.


2.1.  The ip6hdr Structure

    The following structure is defined as a result of including
    <netinet/ip6.h>.  Note that this is a new header.

        struct ip6hdr {



Stevens & Thomas                                                [Page 4]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


          union {
            struct ip6hdrctl {
              u_int32_t ctl6_flow;   /* 24 bits of flow-ID */
              u_int16_t ctl6_plen;   /* payload length */
              u_int8_t  ctl6_nxt;    /* next header */
              u_int8_t  ctl6_hlim;   /* hop limit */
            } un_ctl6;
            u_int8_t un_vfc;         /* 4 bits version, 4 bits priority */
          } ip6_ctlun;
          struct in6_addr ip6_src;   /* source address */
          struct in6_addr ip6_dst;   /* destination address */
        };

        #define ip6_vfc   ip6_ctlun.un_vfc
        #define ip6_flow  ip6_ctlun.un_ctl6.ctl6_flow
        #define ip6_plen  ip6_ctlun.un_ctl6.ctl6_plen
        #define ip6_nxt   ip6_ctlun.un_ctl6.ctl6_nxt
        #define ip6_hlim  ip6_ctlun.un_ctl6.ctl6_hlim
        #define ip6_hops  ip6_ctlun.un_ctl6.ctl6_hlim



2.1.1.  IPv6 Next Header Values

    IPv6 defines many new values for the next header field.  The follow-
    ing constants are defined as a result of including <netinet/in.h>.

        #define IPPROTO_HOPOPTS        0 /* IPv6 hop-by-hop options */
        #define IPPROTO_IPV6          41 /* IPv6 header */
        #define IPPROTO_ROUTING       43 /* IPv6 routing header */
        #define IPPROTO_FRAGMENT      44 /* IPv6 fragmentation header */
        #define IPPROTO_ESP           50 /* encapsulating security payload */
        #define IPPROTO_AH            51 /* authentication header */
        #define IPPROTO_ICMPV6        58 /* ICMPv6 */
        #define IPPROTO_NONE          59 /* IPv6 no next header */
        #define IPPROTO_DSTOPTS       60 /* IPv6 destination options */

    Berkeley-derived IPv4 implementations also define IPPROTO_IP to be
    0.  This should not be a problem since IPPROTO_IP is used only with
    IPv4 sockets and IPPROTO_HOPOPTS only with IPv6 sockets.


2.2.  The icmp6hdr Structure

    The ICMPv6 header is needed by numerous IPv6 applications including
    Ping, Traceroute, router discovery daemons, and neighbor discovery
    daemons.  The following structure is defined as a result of includ-
    ing <netinet/ip6_icmp.h>.  Note that this is a new header.



Stevens & Thomas                                                [Page 5]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        struct icmp6hdr {
          u_int8_t     icmp6_type;   /* type field */
          u_int8_t     icmp6_code;   /* code field */
          u_int16_t    icmp6_cksum;  /* checksum field */
          union {
            u_int32_t  un_data32[1]; /* type-specific field */
            u_int16_t  un_data16[2]; /* type-specific field */
            u_int8_t   un_data8[4];  /* type-specific field */
          } icmp6_dataun;
        };

        #define icmp6_data32    icmp6_dataun.un_data32
        #define icmp6_data16    icmp6_dataun.un_data16
        #define icmp6_data8     icmp6_dataun.un_data8
        #define icmp6_pptr      icmp6_data32[0]  /* parameter prob */
        #define icmp6_mtu       icmp6_data32[0]  /* packet too big */
        #define icmp6_id        icmp6_data16[0]  /* echo request/reply */
        #define icmp6_seq       icmp6_data16[1]  /* echo request/reply */
        #define icmp6_maxdelay  icmp6_data16[0]  /* mcast group membership */



2.2.1.  ICMPv6 Type and Code Values

    In addition to a common structure for the ICMPv6 header, common def-
    initions are required for the ICMPv6 type and code fields.  The fol-
    lowing constants are also defined as a result of including
    <netinet/ip6_icmp.h>.

        #define ICMPV6_DEST_UNREACH     1
        #define ICMPV6_PKT_TOOBIG       2
        #define ICMPV6_TIME_EXCEED      3
        #define ICMPV6_PARAMPROB        4

        #define ICMPV6_INFOMSG_MASK  0x80  /* all informational messages */

        #define ICMPV6_ECHORQST       128
        #define ICMPV6_ECHORPLY       129
        #define ICMPV6_MGM_QUERY      130
        #define ICMPV6_MGM_REPORT     131
        #define ICMPV6_MGM_REDUCTION  132

        #define ICMPV6_DEST_UNREACH_NOROUTE   0 /* no route to destination */
        #define ICMPV6_DEST_UNREACH_ADMIN     1 /* communication with destination */
                                                /*  administratively prohibited */
        #define ICMPV6_DEST_UNREACH_NOTNEIGHBOR 2 /* not a neighbor */
        #define ICMPV6_DEST_UNREACH_ADDR      3 /* address unreachable */
        #define ICMPV6_DEST_UNREACH_NOPORT    4 /* bad port */



Stevens & Thomas                                                [Page 6]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        #define ICMPV6_TIME_EXCEED_HOPS       0 /* Hop Limit == 0 in transit */
        #define ICMPV6_TIME_EXCEED_REASSEMBLY 1 /* Reassembly time out */

        #define ICMPV6_PARAMPROB_HDR          0 /* erroneous header field */
        #define ICMPV6_PARAMPROB_NXT_HDR      1 /* unrecognized Next Header */
        #define ICMPV6_PARAMPROB_OPTS         2 /* unrecognized IPv6 option */

    The five ICMP message types defined by IPv6 neighbor discovery
    (133-137) are defined in the next section.


2.2.2.  ICMPv6 Neighbor Discovery Type and Code Values

    The following constants are defined as a result of including
    <netinet/nd6_protocol.h>.  Note that this is a new header.

        #define ND6_ROUTER_SOLICITATION         133
        #define ND6_ROUTER_ADVERTISEMENT        134
        #define ND6_NEIGHBOR_SOLICITATION       135
        #define ND6_NEIGHBOR_ADVERTISEMENT      136
        #define ND6_REDIRECT                    137

        enum nd6_option {
            ND6_OPT_SOURCE_LINKADDR=1,
            ND6_OPT_TARGET_LINKADDR=2,
            ND6_OPT_PREFIX_INFORMATION=3,
            ND6_OPT_REDIRECTED_HEADER=4,
            ND6_OPT_MTU=5,
            ND6_OPT_ENDOFLIST=256
        };

        struct nd_router_solicit {     /* router solicitation */
          struct icmp6_hdr rsol_hdr;
        };

        #define rsol_type               rsol_hdr.icmp6_type
        #define rsol_code               rsol_hdr.icmp6_code
        #define rsol_cksum              rsol_hdr.icmp6_cksum
        #define rsol_reserved           rsol_hdr.icmp6_data32[0]

        struct nd_router_advert {       /* router advertisement */
          struct icmp6_hdr radv_hdr;
          u_int32_t   radv_reachable;   /* reachable time */
          u_int32_t   radv_retransmit;  /* reachable retransmit time */
        };

        #define radv_type               radv_hdr.icmp6_type
        #define radv_code               radv_hdr.icmp6_code



Stevens & Thomas                                                [Page 7]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        #define radv_cksum              radv_hdr.icmp6_cksum
        #define radv_maxhoplimit        radv_hdr.icmp6_data8[0]
        #define radv_m_o_res            radv_hdr.icmp6_data8[1]
        #define ND6_RADV_M_BIT          0x80
        #define ND6_RADV_O_BIT          0x40
        #define radv_router_lifetime    radv_hdr.icmp6_data16[1]

        struct nd6_nsolicitation {      /* neighbor solicitation */
          struct icmp6_hdr  nsol6_hdr;
          struct in6_addr   nsol6_target;
        };

        struct nd6_nadvertisement {     /* neighbor advertisement */
            struct icmp6_hdr  nadv6_hdr;
            struct in6_addr   nadv6_target;
        };

        #define nadv6_flags nadv6_hdr.icmp6_data32[0]
        #define ND6_NADVERFLAG_ISROUTER      0x80
        #define ND6_NADVERFLAG_SOLICITED     0x40
        #define ND6_NADVERFLAG_OVERRIDE      0x20

        struct nd6_redirect {           /* redirect */
          struct icmp6_hdr  redirect_hdr;
          struct in6_addr   redirect_target;
          struct in6_addr   redirect_destination;
        };

        struct nd6_opt_prefix_info {    /* prefix information */
          u_int8_t    opt_type;
          u_int8_t    opt_length;
          u_int8_t    opt_prefix_length;
          u_int8_t    opt_l_a_res;
          u_int32_t   opt_valid_life;
          u_int32_t   opt_preferred_life;
          u_int32_t   opt_reserved2;
          struct in6_addr  opt_prefix;
        };

        #define ND6_OPT_PI_L_BIT        0x80
        #define ND6_OPT_PI_A_BIT        0x40

        struct nd6_opt_mtu {            /* MTU option */
          u_int8_t   opt_type;
          u_int8_t   opt_length;
          u_int16_t  opt_reserved;
          u_int32_t  opt_mtu;
        };



Stevens & Thomas                                                [Page 8]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


2.3.  Address Testing Macros

    Some basic macros are needed for testing IPv6 addresses for certain
    properties.  Many applications, both elementary and advanced, can
    benefit from these macros.

        int  IN6_IS_ADDR_UNSPECIFIED(const struct in6_addr *);
        int  IN6_IS_ADDR_LOOPBACK(const struct in6_addr *);
        int  IN6_IS_ADDR_MULTICAST(const struct in6_addr *);
        int  IN6_IS_ADDR_LINKLOCAL(const struct in6_addr *);
        int  IN6_IS_ADDR_SITELOCAL(const struct in6_addr *);
        int  IN6_IS_ADDR_V4MAPPED(const struct in6_addr *);
        int  IN6_IS_ADDR_V4COMPAT(const struct in6_addr *);

        int  IN6_IS_ADDR_MC_NODELOCAL(const struct in6_addr *);
        int  IN6_IS_ADDR_MC_LINKLOCAL(const struct in6_addr *);
        int  IN6_IS_ADDR_MC_SITELOCAL(const struct in6_addr *);
        int  IN6_IS_ADDR_MC_ORGLOCAL(const struct in6_addr *);
        int  IN6_IS_ADDR_MC_GLOBAL(const struct in6_addr *);



3.  IPv6 Raw Sockets

    Raw sockets are used to bypass the transport layer (TCP or UDP).
    With IPv4, raw sockets are used to access ICMPv4, IGMPv4, and to
    read and write IPv4 datagrams containing a protocol field that the
    kernel does not process.  An example of the latter is a routing dae-
    mon for OSPF, since it uses IPv4 protocol field 89.  With IPv6 raw
    sockets will be used for ICMPv6 and to read and write IPv6 datagrams
    containing a next header field that the kernel does not process.  An
    example of the latter is a routing daemon for IDRP.

    All data sent via raw sockets MUST be in network byte order and all
    data received received via raw sockets will be in network byte
    order.  This differs from the IPv4 raw sockets, which did not spec-
    ify a byte ordering and typically used the host's byte order.

    Another difference from IPv4 raw sockets is that complete packets
    (that is, IPv6 packets with extension headers) cannot be transferred
    via the IPv6 raw sockets API.  Instead, ancillary data objects are
    used to transfer the extension headers, as described later in this
    document.

    All fields in the IPv6 header that an application might want to
    change (i.e., everything other than the version number) can be modi-
    fied by the application.  Hence there is probably no need for a
    socket option similar to the IPv4 IP_HDRINCL socket option.



Stevens & Thomas                                                [Page 9]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    When we say "an ICMPv6 raw socket" we mean a socket created by call-
    ing the socket function with the three arguments PF_INET6, SOCK_RAW,
    and IPPROTO_ICMPV6.


3.1.  Checksums

    The kernel will calculate and insert the ICMPv6 checksum for ICMPv6
    raw sockets.

    For other raw IPv6 sockets (that is, for raw IPv6 sockets created
    with a third argument other than IPPROTO_ICMPV6), the application
    must set the new IPV6_CHECKSUM socket option to have the kernel com-
    pute and store a checksum.  This option prevents applications from
    having to perform source address selection on the packets they send.
    The checksum will incorporate the IPv6 pseudo-header.  This new
    socket option also specifies an integer offset into the user data of
    where the checksum is to be placed.

        int  offset = 2;
        setsockopt(fd, IPPROTO_IPV6, IPV6_CHECKSUM, offset, sizeof(offset));

    By default, this socket option is disabled, which means the kernel
    will not calculate and store a checksum.


3.2.  ICMPv6 Type Filtering

    ICMPv4 raw sockets receive most ICMPv4 messages received by the ker-
    nel.  (We say "most" and not "all" because Berkeley-derived kernels
    never pass echo requests, timestamp requests, or address mask
    requests to a raw socket.  Instead these three messages are pro-
    cessed entirely by the kernel.)  But ICMPv6 is a superset of ICMPv4,
    also including the functionality of IGMPv4 and ARPv4.  This means
    that an ICMPv6 raw socket can potentially receive many more messages
    than would be received with an ICMPv4 raw socket: ICMP messages sim-
    ilar to ICMPv4, along with neighbor solicitations, neighbor adver-
    tisements, and the three group membership messages.

    Most applications using an ICMPv6 raw socket care about only a small
    subset of the ICMPv6 message types.  To transfer extraneous ICMPv6
    messages from the kernel to user can incur a significant overhead.
    Therefore this API includes a method of filtering ICMPv6 messages by
    the ICMPv6 type field.

    Each ICMPv6 raw socket has an associated filter whose datatype is
    defined as




Stevens & Thomas                                               [Page 10]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        struct icmpv6_filter;

    The current filter is fetched and stored using getsockopt() and set-
    sockopt() with a level of IPPROTO_ICMPV6 and an option name of
    ICMPV6_FILTER.

    Six macros operate on an icmp6_filter structure:

        void ICMPV6_FILTER_SETPASSALL (struct icmp6_filter *);
        void ICMPV6_FILTER_SETBLOCKALL(struct icmp6_filter *);

        void ICMPV6_FILTER_SETPASS ( int, struct icmp6_filter *);
        void ICMPV6_FILTER_SETBLOCK( int, struct icmp6_filter *);

        int  ICMPV6_FILTER_WILLPASS (int, const struct icmp6_filter *);
        int  ICMPV6_FILTER_WILLBLOCK(int, const struct icmp6_filter *);

    The first argument to the last four macros (an integer) is an ICMPv6
    message type, between 0 and 255.  The pointer argument to all six
    macros is a pointer to a filter that is modified by the first four
    macros and examined by the first two macros.

    The first two macros, SETPASSALL and SETBLOCKALL, let us specify
    that all ICMPv6 messages are passed to the application or that all
    ICMPv6 messages are blocked from being passed to the application.

    The next two macros, SETPASS and SETBLOCK, let us specify that mes-
    sages of a given ICMPv6 type should be passed to the application or
    not passed to the application (blocked).

    The final two macros, WILLPASS and WILLBLOCK, return true or false
    depending whether the specified message type is passed to the appli-
    cation or blocked from being passed to the application by the filter
    pointed to by the second argument.

    When an ICMPv6 raw socket is created, it will by default pass all
    ICMPv6 message types to the application.

    As an example, a Ping program could execute the following:

        struct icmp6_filter  myfilt;

        fd = socket(PF_INET6, SOCK_RAW, IPPROTO_ICMPV6);

        ICMPV6_FILTER_SETBLOCKALL(&myfilt);
        ICMPV6_FILTER_SETPASS(ICMPV6_ECHO_REPLY, &myfilt);
        setsockopt(fd, IPPROTO_ICMPV6, ICMPV6_FILTER, &myfilt, sizeof(myfilt));




Stevens & Thomas                                               [Page 11]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    The filter structure is declared and then initialized to block all
    messages types.  The filter structure is then changed to allow
    ICMPv6 echo reply messages to be passed to the application and the
    filter is installed using setsockopt().

    The icmp6_filter structure is similar to the fd_set datatype used
    with the select() function in the sockets API.  The icmp6_filter
    structure is an opaque datatype and the application should not care
    how it is implemented.  All the application does with this datatype
    is allocate a variable of this type, pass a pointer to a variable of
    this type to getsockopt() and setsockopt(), and operate on a vari-
    able of this type using the six macros that we just defined.

    Nevertheless, it is worth showing a simple implementation of this
    datatype and the six macros.

        struct icmp6_filter {
          u_int32m_t  data[8];  /* 8*32 = 256 bits */
        };

        #define ICMPV6_FILTER_WILLPASS(type, filterp) \
            ((((filterp)->data[(type) >> 5]) & (1 << ((type) & 31))) != 0)
        #define ICMPV6_FILTER_WILLBLOCK(type, filterp) \
            ((((filterp)->data[(type) >> 5]) & (1 << ((type) & 31))) == 0)
        #define ICMPV6_FILTER_SETPASS(type, filterp) \
            ((((filterp)->data[(type) >> 5]) |=  (1 << ((type) & 31))))
        #define ICMPV6_FILTER_SETBLOCK(type, filterp) \
            ((((filterp)->data[(type) >> 5]) &= ~(1 << ((type) & 31))))
        #define ICMPV6_FILTER_SETPASSALL(filterp) \
            memset((filterp), 0xFF, sizeof(struct icmp6_filter))
        #define ICMPV6_FILTER_SETBLOCKALL(filterp) \
            memset((filterp), 0, sizeof(struct icmp6_filter))



4.  Ancillary Data

    4.2BSD allowed file descriptors to be transferred between separate
    processes across a UNIX domain socket using the sendmsg() and
    recvmsg() functions.  Two members of the msghdr structure,
    msg_accrights and msg_accrightslen, were used to send and receive
    the descriptors.  When the OSI protocols were added to 4.3BSD Reno
    in 1990 the names of these two fields in the msghdr structure were
    changed to msg_control and msg_controllen, because they were used by
    the OSI protocols for "control information", although the comments
    in the source code call this "ancillary data".

    Other than the OSI protocols, the use of ancillary data has been



Stevens & Thomas                                               [Page 12]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    rare.  In 4.4BSD, for example, the only use of ancillary data with
    IPv4 is to return the destination address of a received UDP datagram
    if the IP_RECVDSTADDR socket option is set.  With Unix domain sock-
    ets ancillary data is still used to send and receive descriptors.

    Nevertheless the ancillary data fields of the msghdr structure pro-
    vide a clean way to pass information in addition to the data that is
    being read or written.  The inclusion of the msg_control and
    msg_controllen members of the msghdr structure along with the cms-
    ghdr structure that is pointed to by the msg_control member is
    required by the Posix.1g sockets API standard (which should be com-
    pleted during 1997).

    Ancillary data is used to exchange the following optional informa-
    tion between the application and the kernel:

        1.  specify the outgoing interface and/or source address,
        2.  receive the incoming interface and destination address,
        3.  send and receive hop-by-hop options,
        4.  send and receive destination options, and
        5.  send and receive routing headers.

    Before describing these uses in detail, we review the definition of
    the msghdr structure itself, the cmsghdr structure that defines an
    ancillary data object, and some macros that operate on the ancillary
    data objects.


4.1.  The msghdr Structure

    The msghdr structure is used by the recvmsg() and sendmsg() func-
    tions.  Its Posix.1g definition is:

        struct msghdr {
          void   *msg_name;        /* ptr to socket address structure */
          size_t  msg_namelen;     /* size of socket address structure */
          struct iovec *msg_iov;   /* scatter/gather array */
          size_t  msg_iovlen;      /* # elements in msg_iov */
          void   *msg_control;     /* ancillary data */
          size_t  msg_controllen;  /* ancillary data buffer length */
          int     msg_flags;       /* flags on received message */
        };

    The structure is declared as a result of including <sys/socket.h>.

    Most Berkeley-derived implementations limit the amount of ancillary
    data in a call to sendmg() to no more than 108 bytes (an mbuf).
    This API requires a minimum of 10240 bytes of ancillary data, but it



Stevens & Thomas                                               [Page 13]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    is recommended that the amount be limited only by the buffer space
    reserved by the socket (which can be modified by the SO_SNDBUF
    socket option).


4.2.  The cmsghdr Structure

    The cmsghdr structure describes ancillary data objects transferred
    by recvmsg() and sendmsg():

        struct cmsghdr {
          size_t  cmsg_len;   /* #bytes, including this header */
          int     cmsg_level; /* originating protocol */
          int     cmsg_type;  /* protocol-specific type */
                  /* followed by unsigned char cmsg_data[]; */
        };

    This structure is declared as a result of including <sys/socket.h>.

    When ancillary data is sent or received, any number of ancillary
    data objects can be specified by the msg_control and msg_controllen
    members of the msghdr structure, because each object is preceded by
    its length (the cmsg_len member).  Historically Berkeley-derived
    implementations have passed only one object at a time, but this API
    allows multiple objects to be passed in a single call to sendmsg()
    or recvmsg().  The following example shows two ancillary data
    objects in a control buffer.

    |<---------------------------- msg_controllen -------------------------->|
    |                                                                        |
    |<------ ancillary data object ------>|<----- ancillary data object ---->|
    |                                     |                                  |
    |<------------- cmsg_len ------------>|<---------- cmsg_len ------------>|
    |                                     |                                  |
    +------------------------------------------------------------------------+
    |cmsg_ |cmsg_ |cmsg_ |                |cmsg_ |cmsg_ |cmsg_ |             |
    |len   |level |type  |   cmsg_data[]  |len   |level |type  | cmsg_data[] |
    +------------------------------------------------------------------------+
     ^
     |
    msg_control
    points here


    To aid in the manipulation of ancillary data objects, three macros
    from 4.4BSD are defined by Posix.1g: CMSG_DATA(), CMSG_NXTHDR(), and
    CMSG_FIRSTHDR().  Before describing these macros, we show the fol-
    lowing example of how they might be used with a call to recvmsg().



Stevens & Thomas                                               [Page 14]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        struct msghdr   msg;
        struct cmsghdr  *cmsgptr;

        /* fill in msg */

        /* call recvmsg() */

        if (msg.msg_controllen > 0) {
            for (cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL;
                 cmsgptr = CMSG_NXTHDR(&msg, cmsgptr)) {
                if (cmsgptr->cmsg_level == ... && cmsgptr->cmsg_type == ... ) {
                    u_char  *ptr;

                    ptr = CMSG_DATA(cmsgptr);
                    /* process data pointed to by ptr */
                }
            }
        }

    We now describe the three Posix.1g macros, followed by two more that
    are new with this API: CMSG_SPACE and CMSG_LENGTH.


4.2.1.  CMSG_FIRSTHDR


        struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *mhdr);

    CMSG_FIRSTHDR returns a pointer to the first cmsghdr structure in
    the msghdr structure pointed to by mhdr.  The macro returns NULL if
    there is no ancillary data pointed to the by msghdr structure.

    The application must check that msg_controllen is greater than 0
    before calling CMSG_FIRSTHDR, because if the application asks for
    control information (by setting msg_control nonnull and
    msg_controllen greater than 0 when calling recvmsg()), but there is
    none to pass back, the kernel just sets msg_control to 0 upon
    return.


4.2.2.  CMSG_NXTHDR


        struct cmsghdr *CMSG_NXTHDR(struct msghdr *mhdr,
                                    struct cmsghdr *cmsg);

    CMSG_NXTHDR returns a pointer to the cmsghdr structure describing
    the next ancillary data object.  mhdr is a pointer to a msghdr



Stevens & Thomas                                               [Page 15]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    structure and cmsg is a pointer to a cmsghdr structure.  If there is
    not another ancillary data object, the return value is NULL.

    The following behavior of this macro is new to this API specifica-
    tion.  If the value of the cmsg pointer is NULL, a pointer to the
    cmsghdr structure describing the first ancillary data object is
    returned.  If there are no ancillary data objects, the return value
    is NULL.


4.2.3.  CMSG_DATA


        unsigned char *CMSG_DATA(struct cmsghdr *cmsg);

    CMSG_DATA returns a pointer to the data (what is called the
    cmsg_data[] member, even though such a member is not defined in the
    structure) following a cmsghdr structure.


4.2.4.  CMSG_SPACE


        unsigned int CMSG_SPACE(unsigned int length);

    This function is new with this API.  Given the length of an ancil-
    lary data object, CMSG_SPACE returns the space required by the
    object and its cmsghdr structure, including any padding needed to
    satisfy alignment requirements.  This function should not be used to
    initialize the cmsg_len member of a cmsghdr structure; instead use
    the CMSG_LENGTH function.


4.2.5.  CMSG_LENGTH


        unsigned int CMSG_LENGTH(unsigned int length);

    This macro is new with this API.  Given the length of an ancillary
    data object, CMSG_LENGTH returns the value to store in the cmsg_len
    member of the cmsghdr structure, taking into account any padding
    needed to satisfy alignment requirements.


4.3.  Summary of Options Described Using Ancillary Data

    We mentioned that five pieces of optional information are passed
    between the application and the kernel using ancillary data:



Stevens & Thomas                                               [Page 16]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        1.  specify the outgoing interface and/or source address,
        2.  receive the incoming interface and destination address,
        3.  send and receive hop-by-hop options,
        4.  send and receive destination options, and
        5.  send and receive routing headers.

    First, to receive the optional information (items 2-5) the applica-
    tion must call setsockopt() to turn on a flag:

        int  on = 1;

        setsockopt(fd, IPPROTO_IPV6,    IPV6_RXINFO,    &on, sizeof(on));
        setsockopt(fd, IPPROTO_IPV6,    IPV6_RXHOPOPTS, &on, sizeof(on));
        setsockopt(fd, IPPROTO_IPV6,    IPV6_RXDSTOPTS, &on, sizeof(on));
        setsockopt(fd, IPPROTO_ROUTING, IPV6_RXSRCRT,   &on, sizeof(on));

    When any of these options are enabled, the corresponding data is
    returned as control information by recvmsg(), as one or more ancil-
    lary data objects.

    Nothing special need be done to send any of this optional informa-
    tion (items 1 and 3-5 in the list above); the application just calls
    sendmsg() and specifies one or more ancillary data objects as con-
    trol information.

    We also summarize the three cmsghdr fields that describe each of the
    five ancillary data objects:

        cmsg_level        cmsg_type      cmsg_data[]
        ---------------   ------------   ------------------------
        IPPROTO_IPV6      IPV6_RXINFO    in6_pktinfo structure
        IPPROTO_IPV6      IPV6_TXINFO    in6_pktinfo structure
        IPPROTO_HOPOPTS   option_type    actual option
        IPPROTO_DSTOPTS   option_type    actual option
        IPPROTO_ROUTING   routing_type   implementation dependent

    These are described in detail in following sections.


4.4.  TCP Access to Ancillary Data

    The summary in the previous section assumes a UDP socket.  Sending
    and receiving ancillary data is easy for with UDP: the application
    calls sendmsg() and recvmsg() instead of sendto() and recvfrom().

    But there might be cases where a TCP application wants to send or
    receive this optional information.  For example, a TCP client might
    want to specify a source route and this needs to be done before



Stevens & Thomas                                               [Page 17]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    calling connect().  Similarly a TCP server might want to know the
    received interface after accept() returns along with any destination
    options.

    One new socket option is defined to allow easy TCP access to these
    optional fields.  Setting the socket option specifies any of the
    optional output fields:

        setsockopt(fd, IPPROTO_IPV6, IPV6_PKTOPTIONS, &buf, len);

    The fourth argument points to a buffer containing one or more ancil-
    lary data objects, and the fifth argument is the total length of all
    these objects.  The application fills in this buffer exactly as if
    the buffer were being passed to sendmsg() as control information.

    The corresponding receive option

        getsockopt(fd, IPPROTO_IPV6, IPV6_PKTOPTIONS, &buf, &lenptr);

    returns a buffer with one or more ancillary data objects for all the
    optional receive information that the application has previously
    specified that it wants to receive.  The fourth argument points to
    the buffer that is filled in by the call.  The fifth argument is a
    pointer to a value-result integer: when the function is called the
    integer specifies the size of the buffer pointed to by the fourth
    argument, and upon return this integer contains the actual number of
    bytes that were returned.  The application processes this buffer
    exactly as if the buffer were returned by recvmsg() as control
    information.

    The options set by calling setsockopt() for IPV6_PKTOPTIONS are
    called "sticky" options because once set they apply to all packets
    sent on that socket.  They may, however, be overridden with ancil-
    lary data specified in a call to sendmsg().

    But the following three options are considered a set: hop-by-hop,
    destination, and routing header options.  If any of these three
    options are specified in a call to sendmsg(), then none of these
    three from the socket's sticky options are sent for this packet.
    For example, if the application calls setsockopt() for
    IPV6_PKTOPTIONS and sets sticky values for the hop-by-hop and desti-
    nation options, but then calls sendmsg() specifying just a routing
    header as an ancillary data object, then only the routing header is
    sent with this packet.  The two sticky options, hop-by-hop and des-
    tination, are not sent for this packet.


5.  Interface Identification



Stevens & Thomas                                               [Page 18]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    Some applications need to know the interface on which a packet was
    received and some applications need to specify the interface on
    which a packet is to be transmitted.  Thus a technique is required
    to identify the interfaces on a system.

    On Berkeley-derived implementations, when an interface is made known
    to the system, the kernel assigns a unique positive integer value
    (called the interface index) to that interface.  These are small
    positive integers that start at 1.  There may be gaps so that there
    is no current interface for a particular interface index.


5.1.  Obtaining the Interface Index

    Currently, there is no simple way to get the index of the interface.
    (4.4BSD returns the index as part of the datalink socket address
    structures returned by the ioctl() of SIOGCIFCONF, but not all sys-
    tems support the AF_LINK socket address structure.)  Since the
    interface index is widely used throughout this API a new ioctl()
    command is defined to retrieve it: SIOCGIFINDEX.  This command uses
    the standard ifreq structure (shown below) and when supplied with a
    interface name it returns the interface index in the ifr_ifindex
    member of the ifreq structure.  Note that the ifr_ifindex is a new
    addition to the ifreq structure and should have a type of "int".


5.2.  The ifreq Structure

    The ifreq structure is used by many of the existing interface ioctls
    to specify or obtain information or attributes of an interface.  For
    example, given the name of an interface (e.g., "de0" or "le0") the
    SIOCGIFADDR command returns the primary IPv4 address of the inter-
    face.  The ifreq structure is declared as a result of including the
    <net/if.h> header, and on many implementations looks like the fol-
    lowing:

        struct ifreq {
        #define IFNAMSIZ        16
          char  ifr_name[IFNAMSIZ];    /* if name, e.g., "en0" */
          union {
                struct  sockaddr ifru_addr;
                struct  sockaddr ifru_dstaddr;
                struct  sockaddr ifru_broadaddr;
                short   ifru_svalue;
                int     ifru_ivalue;
                caddr_t ifru_data;
          } ifr_ifru;
        };



Stevens & Thomas                                               [Page 19]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    To fetch some property of the interface the application stores the
    interface name into the ifr_name[] array and calls ioctl() with a
    command such as SIOCGIFADDR.  The returned value is in one member of
    the union, the exact member depending on the specific command.
    Numerous names are defined to access the members of the union:

        #define ifr_addr      ifr_ifru.ifru_addr      /* address */
        #define ifr_dstaddr   ifr_ifru.ifru_dstaddr   /* other end of p-to-p link */
        #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */
        #define ifr_flags     ifr_ifru.ifru_svalue    /* flags */
        #define ifr_metric    ifr_ifru.ifru_value     /* metric */
        #define ifr_mtu       ifr_ifru.ifru_value     /* mtu */
        #define ifr_ifindex   ifr_ifru.ifru_value     /* interface index */
        #define ifr_data      ifr_ifru.ifru_data      /* for use by interface */

    What this API specifies is the new command of SIOCGIFINDEX and a
    definition of the name ifr_ifindex.  Issuing this command for an
    interface returns the index of the interface.  For example,

        struct ifreq  ifr;

        strcpy(ifr.ifr_name, "de0");
        ioctl(fd, SIOGIFINDEX, &ifr);



5.3.  Returning Received Interface and Destination IPv6 Address

    An application may need to know the destination IPv6 address and the
    received interface.  This information is returned in an in6_pktinfo
    structure as ancillary data if the IPV6_RXINFO socket option is
    enabled.  This structure is defined as a result of including the
    <netinet/in.h> header.

        struct in6_pktinfo {
          int             ipi6_ifindex; /* interface index */
          struct in6_addr ipi6_addr;    /* IPv6 address */
        };

    In the cmsghdr structure containing this ancillary data, the
    cmsg_level member will be IPPROTO_IPV6, the cmsg_type member will be
    IPV6_RXINFO, and the first byte of cmsg_data[] will be the first
    byte of the in6_pktinfo structure.

    Note that this structure is defined only with IPv6 address formats.
    Use of this option with the IPv4 API is beyond the scope of this
    document.  (Note that most 4.4BSD-based implementations support the
    IP_RECVDSTADDR socket option, which returns the destination IPv4



Stevens & Thomas                                               [Page 20]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    address as ancillary data.)

    The application must set the IPV6_RXINFO socket option for this
    information to be returned:

        int  on = 1;
        setsockopt(fd, IPPROTO_IPV6, IPV6_RXINFO, &on, sizeof(on));



5.4.  Specifying Outgoing Interface and Source IPv6 Address

    An application may need to specify the outgoing interface, the
    source address, or both.  Note that the source address can also be
    specified by calling bind() before each output operation, but sup-
    plying the source address together with the data requires less over-
    head (i.e., system calls) and requires less state to be stored and
    protected in a multithreaded application.

    The in6_pktinfo structure defined in the previous section is also
    used to specify the outgoing interface and the source address.  The
    structure is passed as ancillary data to sendmsg() with a cmsg_level
    of IPPROTO_IPV6, a cmsg_type of IPV6_TXINFO, and the first byte of
    cmsg_data[] being the first byte of the in6_pktinfo structure.

    No socket option need be set to use this feature.

    If the ipi6_ifindex is 0, the kernel will choose the outgoing inter-
    face.  If ipi6_addr is the unspecified address (IN6ADDR_ANY_INIT),
    then (a) if an address is currently bound to the socket, it is used
    as the source address, or (b) if no address is currently bound to
    the socket, the kernel will choose the source address.


5.4.1.  Additional Errors with sendmsg()

    With the IPV6_RXINFO socket option there are no additional errors
    possible with the call to recvmsg().  But when specifying the outgo-
    ing interface or the source address, additional errors are possible
    from sendmsg():

    ENXIO         The interface specified by ipi6_ifindex does not
                  exist.

    ENETDOWN      The interface specified by ipi6_ifindex is not enabled
                  for IPv6 use.

    EADDRNOTAVAIL ipi6_ifindex specifies an interface but the address



Stevens & Thomas                                               [Page 21]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


                  ipi6_addr is not available for use on that interface.

    EHOSTUNREACH  No route to the destination exists over the interface
                  specified by ifi6_ifindex.


6.  Hop-By-Hop Options

    A variable number of hop-by-hop options can appear in a single hop-
    by-hop options header.  Each option in the header is TLV-encoded
    with a type, length, and value.

    Today only three hop-by-hop options are defined for IPv6 [1]: jumbo
    payload, pad1, and padN.  None of these three should be passed back
    to an application and an application should receive an error if it
    attempts to set any of these three options.  The jumbo payload
    option is processed entirely by the kernel.  It is indirectly speci-
    fied by datagram-based applications as the size of the datagram to
    send and indirectly passed back to these applications as the length
    of the received datagram.  The two pad options are for alignment
    purposes and are automatically inserted by a sending kernel when
    needed and ignored by the receiving kernel.  This section of the API
    is therefore defined for future hop-by-hop options that an applica-
    tion may need to specify and receive.


6.1.  Receiving Hop-by-Hop Options

    To receive hop-by-hop options the application must enable the
    IPV6_RXHOPOPTS socket option:

        int  on = 1;
        setsockopt(fd, IPPROTO_IPV6, IPV6_RXHOPOPTS, &on, sizeof(on));

    Each individual option is returned as an ancillary data object
    described by a cmsghdr structure.  The cmsg_level member will be
    IPPROTO_HOPOPTS, the cmsg_type member will be the option type, and
    the first byte of cmsg_data[] is the first byte of the option data.


6.2.  Sending Hop-by-Hop Options

    To send one or more hop-by-hop options, the application just speci-
    fies them as ancillary data in a call to sendmsg().  No socket
    option need be set.

    Each option is specified as an ancillary data object by a cmsghdr
    structure.  The cmsg_level member is set to IPPROTO_HOPOPTS, the



Stevens & Thomas                                               [Page 22]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    cmsg_type member is set to the option type, and the first byte of
    cmsg_data[] is the first byte of the option data.

    Additional errors may be possible from sendmsg() if the specified
    option is in error.


7.  Destination Options

    A variable number of destination options can appear in one or more
    destination option headers.  As stated in [1], a destination options
    header appearing before a routing header is processed by the first
    destination plus any subsequent destinations specified in the rout-
    ing header, while a destination options header appearing after a
    routing header is processed only by the final destination.  As with
    the hop-by-hop options, each option in a destination options header
    is TLV-encoded with a type, length, and value.

    Today no destination options are defined for IPv6 [1], although pro-
    posals exist to use destination options with mobility and anycast-
    ing.


7.1.  Receiving Destination Options

    To receive destination options the application must enable the
    IPV6_RXDSTOPTS socket option:

        int  on = 1;
        setsockopt(fd, IPPROTO_IPV6, IPV6_RXDSTOPTS, &on, sizeof(on));

    Each individual option is returned as an ancillary data object
    described by a cmsghdr structure.  The cmsg_level member will be
    IPPROTO_DSTOPTS, the cmsg_type member will be the option type, and
    the first byte of cmsg_data[] is the first byte of the option data.


7.2.  Sending Destination Options

    To send one or more destination options, the application just speci-
    fies them as ancillary data in a call to sendmsg().  No socket
    option need be set.

    Each option is specified as an ancillary data object by a cmsghdr
    structure.  The cmsg_level member is set to IPPROTO_DSTOPTS, the
    cmsg_type member is set to the option type, and the first byte of
    cmsg_data[] is the first byte of the option data.




Stevens & Thomas                                               [Page 23]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    Additional errors may be possible from sendmsg() if the specified
    option is in error.


8.  Source Route Option

    Source routing in IPv6 is accomplished by specifying a routing
    header as an extension header.  There can be different types of
    routing headers, but IPv6 currently defines only the type 0 routing
    header.  This type supports up to 24 intermediate destinations, each
    of which is defined as a loose or a strict hop.

    Source routing with IPv4 sockets API (the IP_OPTIONS socket option)
    requires the application to build the source route in the format
    that appears as the IPv4 header option, requiring intimate knowledge
    of the IPv4 options format.  This API, however, defines seven func-
    tions that the application calls to build and examine a routing
    header.  Three functions build a routing header:

      inet6_srcrt_space()    - return #bytes required for ancillary data
      inet6_srcrt_init()     - initialize ancillary data for routing header
      inet6_srcrt_add()      - add IPv6 address & flags to routing header

    Four functions deal with a returned routing header:

      inet6_srcrt_reverse()  - reverse a routing header
      inet6_srcrt_segments() - return #segments in a routing header
      inet6_srcrt_getaddr()  - fetch one address from a routing header
      inet6_srcrt_getflags() - fetch one flag from a routing header

    A routing header is passed between the application and the kernel as
    ancillary data.  The cmsg_level member has a value of
    IPPROTO_ROUTING and the cmsg_type member specifies the routing
    header type (e.g., 0 for a type 0 routing header).  The contents of
    the cmsg_data[] member is implementation dependent and should not be
    accessed directly by the application, but should be accessed using
    the seven functions that we are about to describe.  The implementa-
    tion-dependent contents of the cmsg_data[] member can maintain state
    information between successive calls to the functions below when
    building a routing header, to make these functions thread safe.  For
    example, implementations could store the "number of segments left"
    field for the type 0 routing header here, initialize it to 0 when
    inet_srcrt_init() is called, and increment it each time
    inet6_srcrt_add() is called.

    The following constants are defined in the <netinet/in.h> header:

        #define IPV6_SRCRT_LOOSE     0 /* this hop need not be a neighbor */



Stevens & Thomas                                               [Page 24]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        #define IPV6_SRCRT_STRICT    1 /* this hop must be a neighbor */

        #define IPV6_SRCRT_TYPE_0    0 /* IPv6 routing header type 0 */

    We note that when a routing header is specified, the destination
    address specified for connect(), sendto(), or sendmsg() is the
    address of the first hop in the source route.  The routing header
    then contains the addresses of all subsequent hops, and the last
    entry in the routing header is the address of the final destination.


8.1.  inet6_srcrt_space


        size_t inet6_srcrt_space(int type, int segments);

    This function returns the maximum number of bytes required to hold a
    routing header of the specified type containing the specified number
    of segments (addresses).  The return value includes the size of the
    cmsghdr structure that precedes the routing header.

    If the return value is 0, then either the type of the routing header
    is not supported by this implementation or the number of segments is
    invalid for this type of routing header.


8.2.  inet6_srcrt_init


        struct cmsghdr *inet6_srcrt_init(void *bp, int type);

    This function initializes the buffer pointed to by bp to contain a
    cmsghdr structure followed by a routing header of the specified
    type.  The cmsg_len member of the cmsghdr structure is initialized
    to the size of the structure plus the amount of space required by
    the routing header.  The cmsg_level and cmsg_type members are ini-
    tialized as required by the type of routing header.

    The return value is the pointer to the cmsghdr structure.  The
    caller must allocate the buffer and its size can be determined by
    calling inet6_srcrt_space().

    If the type of routing header is not supported by the implementa-
    tion, the return value is NULL.


8.3.  inet6_srcrt_add




Stevens & Thomas                                               [Page 25]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        int inet6_srcrt_add(struct cmsghdr *cmsg,
                            const struct in6_addr *addr, unsigned int flags);

    This function adds the address pointed to by addr to the end of the
    routing header being constructed and sets the type of this hop to
    the value of flags.  For an IPv6 type 0 routing header, flags must
    be either IPV6_SRCRT_LOOSE or IPV6_SRCRT_STRICT.

    If successful, the cmsg_len member of the cmsghdr structure is
    updated to account for the new address in the routing header and the
    return value of the function is 0.

    If the address would exceed the limits of the routing header, the
    return value of the function is ENOSPC.  If flags specifies an
    invalid value for the routing header, the return value of the func-
    tion is EINVAL.


8.4.  inet6_srcrt_reverse


        int inet6_srcrt_reverse(const struct cmsghdr *in, struct cmsghdr *out);

    This function takes a routing header that was received as ancillary
    data (pointed to by the first argument) and writes a new routing
    header that sends datagrams along the reverse of that route.  Both
    arguments are allowed to point to the same buffer (that is, the
    reversal can occur in place).  The return value of the function is 0
    on success.

    If the type of routing header in not supported by the implementa-
    tion, the return value of the function is EOPNOTSUPP.  If the rout-
    ing header information is invalid, the return value of the function
    is EINVAL.


8.5.  inet6_srcrt_segments


        int inet6_srcrt_segments(const struct cmsghdr *cmsg)

    This function returns the number of segments (addresses) contained
    in the routing header described by cmsg.  The return value is -1 if
    the cmsghdr structure does not describe a valid routing header or is
    a routing header of an unsupported type.


8.6.  inet6_srcrt_getaddr



Stevens & Thomas                                               [Page 26]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


        struct in6_addr *inet6_srcrt_getaddr(struct cmsghdr *cmsg, int offset);

    This function returns a pointer to the IPv6 address indexed by off-
    set (which starts at 0) in the routing header described by cmsg.  An
    application should first call inet_srcrt_segments() to obtain the
    number of segments in the routing header.

    If offset refers to an address beyond the end of the routing header,
    the return value is NULL.


8.7.  inet6_srcrt_getflags


        int inet6_srcrt_getflags(const struct cmsghdr *cmsg, int offset);

    This function returns the flags value indexed by offset (which
    starts at 0) in the routing header described by cmsg.  For an IPv6
    type 0 routing header the return value will be either
    IPV6_SRCRT_LOOSE or IPV6_SRCRT_STRICT.

    If offset refers to a segment beyond the end of the routing header,
    the return value is -1.


9.  Ordering of Ancillary Data and IPv6 Extension Headers

    Three IPv6 extension headers can be specified by the application and
    returned to the application using ancillary data with sendmsg() and
    recvmsg(): hop-by-hop options, destination options, and the routing
    header.  When multiple ancillary data objects are transferred via
    sendmsg() or recvmsg() and these objects represent any of these
    three extension headers, their placement in the control buffer is
    directly tied to their location in the corresponding IPv6 datagram.
    This API imposes some ordering constraints when using multiple
    ancillary objects with sendmsg().

    When multiple IPv6 hop-by-hop options having the same option type
    are specified, these options will be inserted into the hop-by-hop
    options header in the same order as they appear in the control
    buffer.  But when multiple hop-by-hop options having different
    option types are specified, these options may be reordered by the
    kernel to reduce padding in the hop-by-hop options header.  Hop-by-
    hop options may appear anywhere in the control buffer and will
    always be collected by the kernel and placed into a single hop-by-
    hop options header that appears immediately following the IPv6
    header.




Stevens & Thomas                                               [Page 27]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    Similar rules apply to the destination options: (1) those of the
    same type will appear in the same order as they are specified, and
    (2) those of differing types may be reordered.  But the kernel will
    build up to two destination options headers: one to precede the
    routing header and one to follow the routing header.  If the appli-
    cation specifies a routing header then all destination options that
    appear in the control buffer before the routing header will appear
    in a destination options header before the routing header and these
    options might be reordered, subject to the two rules that we just
    stated.  Similarly all destination options that appear in the con-
    trol buffer after the routing header will appear in a destination
    options header after the routing header, and these options might be
    reordered, subject to the two rules that we just stated.

    As an example, assume that an application specifies control informa-
    tion to sendmsg() containing six ancillary data objects: two hop-by-
    hop options (both of different types), three destination options
    (all of different types), and a routing header.  We number these 1-6
    corresponding to their order in the control buffer.  We then show
    the final arrangement of the options in the extension headers built
    by the kernel:

        Ancillary Data Objects   -->   IPv6 Extension Headers
            HOPOPT-1 (first)              HOPHDR(5,1)
            DSTOPT-2                      DSTHDR(3,2)
            DSTOPT-3                      RTGHDR(4)
             SRCRT-4                      DSTHDR(6)
            HOPOPT-5
            DSTOPT-6 (last)

    The two hop-by-hop options are reordered, as are the first two des-
    tination options.  The first two destination options must appear in
    a destination header before the routing header, and the final desti-
    nation option must appear in a destination header after the routing
    header.

    If destination options are specified in the control buffer after a
    routing header, or if destination options are specified without a
    routing header, the kernel will place those destination options
    after an authentication header and/or an encapsulating security pay-
    load header, if present.


10.  Additional Items

    Discussion is needed on whether or not the following items should be





Stevens & Thomas                                               [Page 28]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    included in this advanced API specification.


10.1.  Path MTU Discovery and UDP

    Should a standard method be defined for a UDP application to deter-
    mine the "maximum send transport-message size" [3; Section 5.1] to a
    given destination?  This would let the UDP application send smaller
    datagrams to the destination, avoiding fragmentation.


10.2.  Neighbor Reachability and UDP

    Should a standard method be defined for a UDP application to tell
    the kernel that it is making forward progress with a given peer [4;
    Section 7.3.1]?  This could save unneeded neighbor solicitations and
    neighbor advertisements.


10.3.  Reading the Routing Table

    There are currently two techniques used by advanced applications on
    Unix systems to read the kernel's routing table.

    1.  Applications can grovel through the kernel's memory (/dev/kmem
        on Unix) to read the routing table.  This requires intimate
        knowledge of the internal routing table format, requires permis-
        sion to read the kernel memory, and is nonportable.  (Note that
        the two common routing table ioctl() commands, SIOCADDRT and
        SIOCDELRT, only add and delete routing table entries.  There is
        no common ioctl() command to return routing table entries.)

    2.  4.3BSD Reno introduced a new function, sysctl(), and one of its
        commands, NET_RT_DUMP, returns the routing table.  The caller
        can optionally specify a protocol family (e.g., AF_INET for IPv4
        or AF_INET6 for IPv6) so that only routing table entries for
        that address family are returned.

    Should this API specify a higher-level set of functions to return
    the routing table, that can be implemented on a wide range of sys-
    tems?


10.4.  Obtaining Interface and Address Information

    Most applications that need to obtain a list of all the interfaces
    on the system call ioctl() with a command of SIOCGIFCONF after fill-
    ing in an ifconf structure with a pointer to a buffer (ifc_buf) in



Stevens & Thomas                                               [Page 29]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    which the returned information is returned, and the size of that
    buffer (ifc_len).  On return the buffer is filled in with the inter-
    face information and ifc_len is updated to indicate how much data is
    in the buffer.

    But when using this command there is no way for the application to
    know how large a buffer to allocate before calling ioctl().  Also
    what is returned is information for all interfaces and for all
    addresses across all address families for those interfaces.

    Should a new ioctl() command be defined (SIOCGIFINFO) with a new
    structure that is similar to the ifconf structure, but with two new
    members: ifc_index and ifc_family?  The changes from the existing
    SIOCGIFCONF would be:

    1.  If the ifc_buf member is NULL then nothing is returned other
        than setting the ifc_len member to the amount of space required
        to hold the requested interface information.

    2.  If the ifc_index member is nonzero, information is returned for
        only the interface with that index.

    3.  If the ifc_family member is not AF_UNSPEC, the only addresses
        returned are those for the specified address family (AF_INET or
        AF_INET6, for example).


11.  References


    [1]  Deering, S., Hinden, R., "Internet Protocol, Version 6 (IPv6),
         Specification", RFC 1883, Dec. 1995.

    [2]  Gilligan, R. E., Thomson, S., Bound, J., "Basic Socket Inter-
         face Extensions for IPv6", Internet-Draft, draft-ietf-ipngwg-
         bsd-api-05.txt, April 1996.

    [3]  McCann, J., Deering, S., Mogul, J, "Path MTU Discovery for IP
         version 6", RFC 1981, Aug. 1996.

    [4]  Narten, T., Nordmark, E., Simpson, W., "Neighbor Discovery for
         IP Version 6 (IPv6)", RFC 1970, Aug. 1996.


12.  Acknowledgments

    Matt Thomas and Jim Bound have been working on the technical details
    in this draft for over a year.  Keith Sklower is the original



Stevens & Thomas                                               [Page 30]


INTERNET-DRAFT        Advanced Sockets API for IPv6         October 1996


    implementor of ancillary data in the BSD networking code.


13.  Authors' Addresses

        W. Richard Stevens
        1202 E. Paseo del Zorro
        Tucson, AZ  85718
        Email: rstevens@kohala.com

        Matt Thomas
        Digital Equipment Corporation
        550 King St, LKG2-2/Q5
        Littleton, MA  01460
        Email: thomas@lkg.dec.com




































Stevens & Thomas                                               [Page 31]