Remote Direct Data Placement Work Group  R. Recio
 INTERNET DRAFT                             IBM Corporation
 draft-ietf-rddp-rdmap-06.txt             P. Culley
                                            Hewlett-Packard Company
                                          D. Garcia
                                            Hewlett-Packard Company
                                          J. Hilland
                                            Hewlett-Packard Company
                                          B. Metzler
                                            IBM Corporation
 
 Expires: January, 2007                    June 1, 2006
 
 
    A Remote Direct Memory Access Protocol Specification
 
    Status of this Memo
 
    By submitting this Internet-Draft, each author represents that any
    applicable patent or other IPR claims of which he or she is aware
    have been or will be disclosed, and any of which he or she becomes
    aware will be disclosed, in accordance with Section 6 of BCP 79.
 
    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.
 
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other
    documents at any time.  It is inappropriate to use Internet-Drafts
    as reference material or to cite them other than as "work in
    progress."
 
    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/1id-abstracts.html The list of Internet-Draft
    Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.
 
 
 
 
 
 
 
 
 
                         Expires January, 2007                [Page 1]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Abstract
 
    This document defines a Remote Direct Memory Access Protocol
    (RDMAP) that operates over the Direct Data Placement Protocol (DDP
    protocol).  RDMAP provides read and write services directly to
    applications and enables data to be transferred directly into Upper
    Layer Protocol (ULP) Buffers without intermediate data copies. It
    also enables a kernel bypass implementation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007                [Page 2]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Table of Contents
 
    1    Introduction...............................................6
    1.1  Architectural Goals........................................6
    1.2  Protocol Overview..........................................7
    1.3  RDMAP Layering............................................10
    1.4  Specification Changes from the Last Version...............11
    2    Glossary..................................................14
    2.1  General...................................................14
    2.2  LLP.......................................................16
    2.3  Direct Data Placement (DDP)...............................17
    2.4  Remote Direct Memory Access (RDMA)........................19
    3    ULP and Transport Attributes..............................22
    3.1  Transport Requirements & Assumptions......................22
    3.2  RDMAP Interactions with the ULP...........................23
    4    Header Format.............................................27
    4.1  RDMAP Control and Invalidate STag Field...................27
    4.2  RDMA Message Definitions..................................30
    4.3  RDMA Write Header.........................................31
    4.4  RDMA Read Request Header..................................32
    4.5  RDMA Read Response Header.................................34
    4.6  Send Header and Send with Solicited Event Header..........34
    4.7  Send with Invalidate Header and Send with SE and Invalidate
    Header..........................................................34
    4.8  Terminate Header..........................................34
    5    Data Transfer.............................................41
    5.1  RDMA Write Message........................................41
    5.2  RDMA Read Operation.......................................42
    5.2.1  RDMA Read Request Message................................42
    5.2.2  RDMA Read Response Message...............................43
    5.3  Send Message Type.........................................44
    5.4  Terminate Message.........................................46
    5.5  Ordering and Completions..................................47
    6    RDMAP Stream Management...................................51
    6.1  Stream Initialization.....................................51
    6.2  Stream Teardown...........................................52
    6.2.1  RDMAP Abortive Termination...............................52
    7    RDMAP Error Management....................................54
    7.1  RDMAP Error Surfacing.....................................54
    7.2  Errors Detected at the Remote Peer on Incoming RDMA Messages
         55
    8    Security..................................................57
    8.1  Summary of RDMAP specific Security Requirements...........57
 
 
                         Expires January, 2007                [Page 3]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    8.1.1  RDMAP (RNIC) Requirements................................57
    8.1.2  Privileged Resource Manager Requirements.................59
    8.2  Security Services for RDMAP...............................60
    8.2.1  Available Security Services..............................60
    8.2.2  Requirements for IPsec Services for RDMAP................61
    9    IANA......................................................64
    10   References................................................65
    10.1   Normative References.....................................65
    10.2   Informative References...................................65
    11   Appendix..................................................67
    11.1   DDP Segment Formats for RDMA Messages....................67
    11.1.1  DDP Segment for RDMA Write.............................67
    11.1.2  DDP Segment for RDMA Read Request......................67
    11.1.3  DDP Segment for RDMA Read Response.....................69
    11.1.4  DDP Segment for Send and Send with Solicited Event.....69
    11.1.5  DDP Segment for Send with Invalidate and Send with SE and
    Invalidate......................................................70
    11.1.6  DDP Segment for Terminate..............................71
    11.2   Ordering and Completion Table............................71
    12   Author's Address..........................................75
    13   Contributors..............................................76
    14   Intellectual Property Statement...........................80
    15   IPR Disclosure Acknowledgement..Error! Bookmark not defined.
    16   Full Copyright Statement..................................81
 
 
    Table of Figures
 
    Figure 1 RDMAP Layering.........................................10
    Figure 2 Example of MPA, DDP, and RDMAP Header Alignment over TCP11
    Figure 3 DDP Control, RDMAP Control, and Invalidate STag Fields.28
    Figure 4 RDMA Usage of DDP Fields...............................29
    Figure 5 RDMA Message Definitions...............................31
    Figure 6 RDMA Read Request Header Format........................32
    Figure 7 Terminate Header Format................................35
    Figure 8 Terminate Control Field................................35
    Figure 9 Terminate Control Field Values.........................38
    Figure 10 Error Type to RDMA Message Mapping....................40
    Figure 11 RDMA Write, DDP Segment format........................67
    Figure 12 RDMA Read Request, DDP Segment format.................68
    Figure 13 RDMA Read Response, DDP Segment format................69
    Figure 14 Send and Send with Solicited Event, DDP Segment format 70
 
 
 
                         Expires January, 2007                [Page 4]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Figure 15 Send with Invalidate and Send with SE and Invalidate, DDP
    Segment.........................................................70
    Figure 16 Terminate, DDP Segment format.........................71
    Figure 17 Operation Ordering....................................74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007                [Page 5]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   1  Introduction
 
    Today, communications over TCP/IP typically require copy
    operations, which add latency and consume significant CPU and
    memory resources.  The Remote Direct Memory Access Protocol (RDMAP)
    enables removal of data copy operations and enables reduction in
    latencies by allowing a local application to read or write data on
    a remote computer's memory with minimal demands on memory bus
    bandwidth and CPU processing overhead, while preserving memory
    protection semantics.
 
    RDMAP is layered on top of Direct Data Placement (DDP) and uses the
    two Buffer Models available from DDP. DDP-related terminology is
    discussed in Section 2.3. As RDMAP builds on DDP the reader is
    advised to become familiar with [DDP].
 
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in [RFC2119]."
 
 
 
 
 1.1  Architectural Goals
 
    RDMAP has been designed with the following high-level architectural
    goals:
 
    *  Provide a data transfer operation that allows a Local Peer to
       transfer up to 2^32 - 1 octets directly into a previously
       advertised buffer (i.e. Tagged buffer) located at a Remote Peer
       without requiring a copy operation. This is referred to as the
       RDMA Write data transfer operation.
 
    *  Provide a data transfer operation that allows a Local Peer to
       retrieve up to 2^32 - 1 octets directly from a previously
       advertised buffer (i.e. Tagged buffer) located at a Remote Peer
       without requiring a copy operation. This is referred to as the
       RDMA Read data transfer operation.
 
    *  Provide a data transfer operation that allows a Local Peer to
       send up to 2^32 - 1 octets directly into a buffer located at a
       Remote Peer that has not been explicitly advertised. This is
 
 
                         Expires January, 2007                [Page 6]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
       referred to as the Send (Send with Invalidate, Send with
       Solicited Event, and Send with Solicited Event and Invalidate)
       data transfer operation.
 
    *  Enable the local ULP to use the Send Operation Type (includes
       Send, Send with Invalidate, Send with Solicited Event, and Send
       with Solicited Event and Invalidate) to signal to the remote ULP
       the Completion of all previous Messages initiated by the local
       ULP.
 
    *  Provide for all Operations on a single RDMAP Stream to be
       reliably transmitted in the order that they were submitted.
 
    *  Provide RDMAP capabilities independently for each Stream when
       the LLP supports multiple data Streams within an LLP connection.
 
 1.2  Protocol Overview
 
    RDMAP provides seven data transfer operations. Except for the RDMA
    Read operation, each operation generates exactly one RDMA Message.
    Following is a brief overview of the RDMA Operations and RDMA
    Messages:
 
    1.  Send - A Send operation uses a Send Message to transfer data
        from the Data Source into a buffer that has not been explicitly
        Advertised by the Data Sink. The Send Message uses the DDP
        Untagged Buffer Model to transfer the ULP Message into the Data
        Sink's Untagged Buffer.
 
    2.  Send with Invalidate - A Send with Invalidate operation uses a
        Send with Invalidate Message to transfer data from the Data
        Source into a buffer that has not been explicitly Advertised by
        the Data Sink. The Send with Invalidate Message includes all
        functionality of the Send Message, with one addition: an STag
        field is included in the Send With Invalidate Message and after
        the message has been Placed and Delivered at the Data Sink the
        remote peer's buffer identified by the STag can no longer be
        accessed remotely until the remote peer's ULP re-enables access
        and Advertises the buffer.
 
    3.  Send with Solicited Event (Send with SE) - A Send with
        Solicited Event operation uses a Send with Solicited Event
        Message to transfer data from the Data Source into an Untagged
 
 
                         Expires January, 2007                [Page 7]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        Buffer at the Data Sink. The Send with Solicited Event Message
        is similar to the Send Message, with one addition: when the
        Send with Solicited Event Message has been Placed and
        Delivered, an Event may be generated at the recipient, if the
        recipient is configured to generate such an Event.
 
    4.  Send with Solicited Event and Invalidate (Send with SE and
        Invalidate) - A Send with Solicited Event and Invalidate
        operation uses a Send with Solicited Event and Invalidate
        Message to transfer data from the Data Source into a buffer
        that has not been explicitly Advertised by the Data Sink. The
        Send with Solicited Event and Invalidate Message is similar to
        the Send with Invalidate Message, with one addition: when the
        Send with Solicited Event and Invalidate Message has been
        Placed and Delivered, an Event may be generated at the
        recipient, if the recipient is configured to generate such an
        Event.
 
    5.  Remote Direct Memory Access Write - An RDMA Write operation
        uses an RDMA Write Message to transfer data from the Data
        Source to a previously advertised buffer at the Data Sink.
 
        The ULP at the Remote Peer, which in this case is the Data
        Sink, enables the Data Sink Tagged Buffer for access and
        Advertises the buffer's size (length), location (Tagged
        Offset), and Steering Tag (STag) to the Data Source through a
        ULP specific mechanism. The ULP at the Local Peer, which in
        this case is the Data Source, initiates the RDMA Write
        operation. The RDMA Write Message uses the DDP Tagged Buffer
        Model to transfer the ULP Message into the Data Sink's Tagged
        Buffer. Note: the STag associated with the Tagged Buffer
        remains valid until the ULP at the Remote Peer invalidates it
        or the ULP at the Local Peer invalidates it through a Send with
        Invalidate or Send with Solicited Event and Invalidate.
 
    6.  Remote Direct Memory Access Read - The RDMA Read operation
        transfers data to a Tagged Buffer at the Local Peer, which in
        this case is the Data Sink, from a Tagged Buffer at the Remote
        Peer, which in this case is the Data Source. The ULP at the
        Data Source enables the Data Source Tagged Buffer for access
        and Advertises the buffer's size (length), location (Tagged
        Offset), and Steering Tag (STag) to the Data Sink through a ULP
        specific mechanism. The ULP at the Data Sink enables the Data
 
 
                         Expires January, 2007                [Page 8]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        Sink Tagged Buffer for access and initiates the RDMA Read
        operation. The RDMA Read operation consists of a single RDMA
        Read Request Message and a single RDMA Read Response Message,
        and the latter may be segmented into multiple DDP Segments.
 
        The RDMA Read Request Message uses the DDP Untagged Buffer
        Model to Deliver the STag, starting Tagged Offset and length
        for both the Data Source and Data Sink Tagged Buffers to the
        remote peer's RDMA Read Request Queue.
 
        The RDMA Read Response Message uses the DDP Tagged Buffer Model
        to Deliver the Data Source's Tagged Buffer to the Data Sink,
        without any involvement from the ULP at the Data Source.
 
        Note: the Data Source STag associated with the Tagged Buffer
        remains valid until the ULP at the Data Source invalidates it
        or the ULP at the Data Sink invalidates it through a Send with
        Invalidate or Send with Solicited Event and Invalidate. The
        Data Sink STag associated with the Tagged Buffer remains valid
        until the ULP at the Data Sink invalidates it.
 
    7.  Terminate - A Terminate operation uses a Terminate Message to
        transfer to the Remote Peer information associated with an
        error that occurred at the Local Peer. The Terminate Message
        uses the DDP Untagged Buffer Model to transfer the Message into
        the Data Sink's Untagged Buffer.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007                [Page 9]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 
 
 
 
 
 
 
 
 1.3  RDMAP Layering
 
    RDMAP is dependent on DDP, subject to the requirements defined in
    section 3.1 Transport Requirements & Assumptions.  Figure 1 RDMAP
    Layering depicts the relationship between Upper Layer Protocols
    (ULPs), RDMAP, DDP protocol, the framing layer, and the transport.
    For LLP protocol definitions of each LLP, see [MPA], [TCP], and
    [SCTP].
 
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |                                     |
                  |     Upper Layer Protocol (ULP)      |
                  |                                     |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |                                     |
                  |              RDMAP                  |
                  |                                     |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |                                     |
                  |           DDP protocol              |
                  |                                     |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |                 |                   |
                  |       MPA       |                   |
                  |                 |                   |
                  +-+-+-+-+-+-+-+-+-+       SCTP        |
                  |                 |                   |
                  |       TCP       |                   |
                  |                 |                   |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 1 RDMAP Layering
 
    If RDMAP is layered over DDP/MPA/TCP, then the respective headers
    and ULP Payload are arranged as follows (Note: For clarity, MPA
    header and CRC fields are included but MPA markers are not shown):
 
 
                         Expires January, 2007               [Page 10]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     //                           TCP Header                        //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         MPA Header            |                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
     |                                                               |
     //                        DDP Header                           //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     //                        RDMA Header                          //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     //                        ULP Payload                          //
     |                  (shown with no pad bytes)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                           MPA CRC                             |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 2 Example of MPA, DDP, and RDMAP Header Alignment over TCP
 
 1.4  Specification Changes from the Last Version
 
    This section is to be removed before RFC publication.
 
 
 
    The following major changes (vs typos) were made to the -05
    version:
 
    *  To pass the IETF checklist tool, modified heading of Security
       Section 8 to "Security" and added "Security Considerations"
       below it.
 
    *  Added IANA Section 9 and to pass the IETF checklist tool added
       "IANA Considerations" line below Section 9 header.
 
    *  Added Intellectual Property Statement Section 14 and IPR
       Disclosure Acknowledgement Section 15.
 
 
                         Expires January, 2007               [Page 11]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  Added Disclaimer Section 16.
 
    *  Section 6.8 - Acknowledged that the Reserved field size for the
       Terminate Message is 13 bits. The fix was made to the -04
       version, but was not listed in this section.
 
    *  Rewrite of the "Security" section to refer to Security document
       rather than summarize.
 
    *  Update to the "Contributors" section.
 
    *  Changed boilerplate reference form 3667 to 3979.
 
    *  Removed references to company names in the disclaimer section.
 
    *  Added "Key Words" Disclaimer to the Introduction.
 
 
 
    The following major changes (vs typos) were made to the -04
    version:
 
    *  Section 10 - Expanded IPsec requirements sentence in section
       10.3.2 to say what is required in addition to cross-referencing
       RFC 3723.
 
    *  Section 6.8 - Fixed text after Figure 9 to reflect the correct
       size (13 bits) of the Reserved field in the Terminate Message.
 
    The following major changes (vs typos) were made to the -03
    version:
 
    *  Section 6.1 - Added normative text describing downward
       compatibility with version 0.
 
    *  Section 6.8 - Changed the description of the reserved field size
       to match the size in the figure, which is 13 bits.
 
    *  Section 10 - Aligned security section closely to [RDMASEC] and
       added normative text for security requirements.
 
    The following major changes (vs typos) were made to the -02
    version:
 
 
                         Expires January, 2007               [Page 12]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  Section 6.8 - Explicitly defined the bit numbers for the three
       header control bits.
 
    *  Section 8.1 - Stated the typical Stream initialization to be:
       RDMA mode is entered some time after the LLP Stream is
       initialized.
 
    *  Section 10 - Update reference to security document.
 
    *  Section 10 - Fixed Send with Solicited Event and Invalidate
       reference.
 
    *  Section 12.1 - MPA and DDP references were changed to reflect
       the released specifications and accurate titles.
 
    *  Section 12.1 - Reference for RDMA Protocol Verbs was changed to
       reflect the released specification and accurate title.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 13]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   2  Glossary
 
 2.1 General
 
    Advertisement (Advertised, Advertise, Advertisements, Advertises) -
        the act of informing a Remote Peer that a local RDMA Buffer is
        available to it. A Node makes available an RDMA Buffer for
        incoming RDMA Read or RDMA Write access by informing its
        RDMA/DDP peer of the Tagged Buffer identifiers (STag, base
        address, and buffer length). This Advertisement of Tagged
        Buffer information is not defined by RDMA/DDP and is left to
        the ULP. A typical method would be for the Local Peer to embed
        the Tagged Buffer's Steering Tag, base address, and length in a
        Send Message destined for the Remote Peer.
 
    Completion - Refer to "RDMA Completion" in Section 2.4.
 
    Completed - See "RDMA Completion" in Section 2.4.
 
    Complete - See "RDMA Completion" in Section 2.4.
 
    Completes - See "RDMA Completion" in Section 2.4.
 
    Data Sink - The peer receiving a data payload. Note that the Data
        Sink can be required to both send and receive RDMA/DDP Messages
        to transfer a data payload.
 
    Data Source - The peer sending a data payload. Note that the Data
        Source can be required to both send and receive RDMA/DDP
        Messages to transfer a data payload.
 
    Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined
        as the process of informing the ULP or consumer that a
        particular Message is available for use.  This is specifically
        different from "Placement", which may generally occur in any
        order, while the order of "Delivery" is strictly defined. See
        "Data Placement" in Section 2.3.
 
    Delivery - See Data Delivery in Section 2.1.
 
    Delivered - See Data Delivery in Section 2.1.
 
    Delivers - See Data Delivery in Section 2.1.
 
 
                         Expires January, 2007               [Page 14]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Fabric - The collection of links, switches, and routers that
        connect a set of Nodes with RDMA/DDP protocol implementations.
 
    Fence (Fenced, Fences) - To block the current RDMA Operation from
        executing until prior RDMA Operations have Completed.
 
    iWARP - A suite of wire protocols comprised of RDMAP, DDP, and MPA.
        The iWARP protocol suite may be layered above TCP, SCTP, or
        other transport protocols.
 
    Local Peer - The RDMA/DDP protocol implementation on the local end
        of the connection. Used to refer to the local entity when
        describing a protocol exchange or other interaction between two
        Nodes.
 
    Node - A computing device attached to one or more links of a Fabric
        (network). A Node in this context does not refer to a specific
        application or protocol instantiation running on the computer.
        A Node may consist of one or more RNICs installed in a host
        computer.
 
    Placement - See "Data Placement" in Section 2.3
 
    Placed - See "Data Placement" in Section 2.3
 
    Places - See "Data Placement" in Section 2.3
 
    Remote Peer - The RDMA/DDP protocol implementation on the opposite
        end of the connection. Used to refer to the remote entity when
        describing protocol exchanges or other interactions between two
        Nodes.
 
    RNIC - RDMA Network Interface Controller. In this context, this
        would be a network I/O adapter or embedded controller with
        iWARP and Verbs functionality.
 
    RNIC Interface (RI) - The presentation of the RNIC to the Verbs
        Consumer as implemented through the combination of the RNIC and
        the RNIC driver.
 
    Termination - See "RDMAP Abortive Termination" in Section 2.4.
 
    Terminated - See "RDMAP Abortive Termination" in Section 2.4.
 
 
                         Expires January, 2007               [Page 15]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 
 
    Terminate - See "RDMAP Abortive Termination" in Section 2.4
 
    Terminates - See "RDMAP Abortive Termination" in Section 2.4
 
    ULP - Upper Layer Protocol. The protocol layer above the protocol
        layer currently being referenced. The ULP for RDMA/DDP is
        expected to be an OS, Application, adaptation layer, or
        proprietary device.  The RDMA/DDP documents do not specify a
        ULP - they provide a set of semantics that allow a ULP to be
        designed to utilize RDMA/DDP.
 
    ULP Payload - The ULP data that is contained within a single
        protocol segment or packet (e.g. a DDP Segment).
 
    Verbs - An abstract description of the functionality of a RNIC
        Interface. The OS may expose some or all of this functionality
        via one or more APIs to applications. The OS will also use some
        of the functionality to manage the RNIC Interface.
 
 2.2 LLP
 
    LLP - Lower Layer Protocol. The protocol layer beneath the protocol
        layer currently being referenced. For example, for DDP the LLP
        is SCTP, MPA, or other transport protocols. For RDMA, the LLP
        is DDP.
 
    LLP Connection - Corresponds to an LLP transport-level connection
        between the peer LLP layers on two nodes.
 
    LLP Stream - Corresponds to a single LLP transport-level Stream
        between the peer LLP layers on two Nodes. One or more LLP
        Streams may map to a single transport-level LLP connection. For
        transport protocols that support multiple Streams per
        connection (e.g. SCTP), a LLP Stream corresponds to one
        transport-level Stream.
 
    MULPDU - Maximum ULPDU. The current maximum size of the record that
        is acceptable for DDP to pass to the LLP for transmission.
 
    ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
        the layer above MPA.
 
 
                         Expires January, 2007               [Page 16]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 2.3 Direct Data Placement (DDP)
 
    Data Placement (Placement, Placed, Places) - For DDP, this term is
        specifically used to indicate the process of writing to a data
        buffer by a DDP implementation.  DDP Segments carry Placement
        information, which may be used by the receiving DDP
        implementation to perform Data Placement of the DDP Segment ULP
        Payload. See "Data Delivery".
 
    DDP Abortive Teardown - The act of closing a DDP Stream without
        attempting to Complete in-progress and pending DDP Messages.
 
    DDP Graceful Teardown - The act of closing a DDP Stream such that
        all in-progress and pending DDP Messages are allowed to
        Complete successfully.
 
    DDP Control Field - a fixed 16-bit field in the DDP Header. The DDP
        Control Field contains an 8-bit field whose contents are
        reserved for use by the ULP.
 
    DDP Header - The header present in all DDP segments. The DDP Header
        contains control and Placement fields that are used to define
        the final Placement location for the ULP payload carried in a
        DDP Segment.
 
    DDP Message - A ULP defined unit of data interchange, which is
        subdivided into one or more DDP segments. This segmentation may
        occur for a variety of reasons, including segmentation to
        respect the maximum segment size of the underlying transport
        protocol.
 
    DDP Segment - The smallest unit of data transfer for the DDP
        protocol. It includes a DDP Header and ULP Payload (if
        present). A DDP Segment should be sized to fit within the
        underlying transport protocol MULPDU.
 
    DDP Stream - a sequence of DDP Messages whose ordering is defined
        by the LLP. For SCTP, a DDP Stream maps directly to an SCTP
        Stream. For MPA, a DDP Stream maps directly to a TCP connection
        and a single DDP Stream is supported.  Note that DDP has no
        ordering guarantees between DDP Streams.
 
 
 
 
                         Expires January, 2007               [Page 17]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Direct Data Placement  - A mechanism whereby ULP data contained
        within DDP Segments may be Placed directly into its final
        destination in memory without processing of the ULP. This may
        occur even when the DDP Segments arrive out of order. Out of
        order Placement support may require the Data Sink to implement
        the LLP and DDP as one functional block.
 
    Direct Data Placement Protocol (DDP) - Also, a wire protocol that
        supports Direct Data Placement by associating explicit memory
        buffer placement information with the LLP payload units.
 
    Message Offset (MO) - For the DDP Untagged Buffer Model, specifies
        the offset, in bytes, from the start of a DDP Message.
 
    Message Sequence Number (MSN) - For the DDP Untagged Buffer Model,
        specifies a sequence number that is increasing with each DDP
        Message.
 
    Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a
        destination Data Sink queue for a DDP Segment.
 
    Steering Tag - An identifier of a Tagged Buffer on a Node, valid as
        defined within a protocol specification.
 
    STag - Steering Tag
 
    Tagged Buffer - A buffer that is explicitly Advertised to the
        Remote Peer through exchange of an STag, Tagged Offset, and
        length.
 
    Tagged Buffer Model - A DDP data transfer model used to transfer
        Tagged Buffers from the Local Peer to the Remote Peer.
 
    Tagged DDP Message - A DDP Message that targets a Tagged Buffer.
 
    Tagged Offset (TO) - The offset within a Tagged Buffer on a Node.
 
    Untagged Buffer - A buffer that is not explicitly Advertised to the
        Remote Peer. Untagged buffers support one of the two available
        data transfer mechanisms called the Untagged Buffer Model. An
        untagged buffer is used to send asynchronous control messages
        to the Remote Peer for RDMA Read, Send, and Terminate requests.
        Untagged Buffers handle Untagged DDP Messages.
 
 
                         Expires January, 2007               [Page 18]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Untagged Buffer Model - A DDP data transfer model used to transfer
        Untagged Buffers from the Local Peer to the Remote Peer.
 
    Untagged DDP Message - A DDP Message that targets an Untagged
        Buffer.
 
 2.4 Remote Direct Memory Access (RDMA)
 
    Event - An indication provided by the RDMAP Layer to the ULP to
        indicate a Completion or other condition requiring immediate
        attention.
 
    Invalidate STag - A mechanism used to prevent the Remote Peer from
        reusing a previous explicitly Advertised STag, until the Local
        Peer makes it available through a subsequent explicit
        Advertisement. The STag cannot be accessed remotely until it is
        explicit Advertised again.
 
    RDMA Completion (Completion, Completed, Complete, Completes) - For
        RDMA, Completion is defined as the process of informing the ULP
        that a particular RDMA Operation has performed all functions
        specified for the RDMA Operations, including Placement and
        Delivery.  The Completion semantic of each RDMA Operation is
        distinctly defined.
 
    RDMA Message - A data transfer mechanism used to fulfill an RDMA
        Operation.
 
    RDMA Operation - A sequence of RDMA Messages, including control
        Messages, to transfer data from a Data Source to a Data Sink.
        The following RDMA Operations are defined - RDMA Writes, RDMA
        Read, Send, Send with Invalidate, Send with Solicited Event,
        Send with Solicited Event and Invalidate, and Terminate.
 
    RDMA Protocol (RDMAP) - A wire protocol that supports RDMA
        Operations to transfer ULP data between a Local Peer and the
        Remote Peer.
 
    RDMAP Abortive Termination (Termination, Terminated, Terminate,
        Terminates) - The act of closing an RDMAP Stream without
        attempting to Complete in-progress and pending RDMA Operations.
 
 
 
 
                         Expires January, 2007               [Page 19]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    RDMAP Graceful Termination - The act of closing an RDMAP Stream
        such that all in-progress and pending RDMA Operations are
        allowed to Complete successfully.
 
    RDMA Read - An RDMA Operation used by the Data Sink to transfer the
        contents of a source RDMA buffer from the Remote Peer to the
        Local Peer. An RDMA Read operation consists of a single RDMA
        Read Request Message and a single RDMA Read Response Message.
 
    RDMA Read Request - An RDMA Message used by the Data Sink to
        request the Data Source to transfer the contents of an RDMA
        buffer. The RDMA Read Request Message describes both the Data
        Source and Data Sink RDMA buffers.
 
    RDMA Read Request Queue - The queue used for processing RDMA Read
        Requests. The RDMA Read Request Queue has a DDP Queue Number of
        1.
 
    RDMA Read Response - An RDMA Message used by the Data Source to
        transfer the contents of an RDMA buffer to the Data Sink, in
        response to an RDMA Read Request. The RDMA Read Response
        Message only describes the data sink RDMA buffer.
 
    RDMAP Stream - An association between a pair of RDMAP
        implementations, possibly on different Nodes, which transfer
        ULP data using RDMA Operations. There may be multiple RDMAP
        Streams on a single Node. An RDMAP Stream maps directly to a
        single DDP Stream.
 
    RDMA Write - An RDMA Operation that transfers the contents of a
        source RDMA Buffer from the Local Peer to a destination RDMA
        Buffer at the Remote Peer using RDMA. The RDMA Write Message
        only describes the Data Sink RDMA buffer.
 
    Remote Direct Memory Access (RDMA) - A method of accessing memory
        on a remote system in which the local system specifies the
        remote location of the data to be transferred. Employing a RNIC
        in the remote system allows the access to take place without
        interrupting the processing of the CPU(s) on the system.
 
    Send - An RDMA Operation that transfers the contents of a ULP
        Buffer from the Local Peer to an Untagged Buffer at the Remote
        Peer.
 
 
                         Expires January, 2007               [Page 20]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Send Message Type - A Send Message, Send with Invalidate Message,
        Send with Solicited Event Message, or Send with Solicited Event
        and Invalidate Message.
 
    Send Operation Type - A Send Operation, Send with Invalidate
        Operation, Send with Solicited Event Operation, or Send with
        Solicited Event and Invalidate Operation.
 
    Solicited Event (SE) - A facility by which an RDMA Operation sender
        may cause an Event to be generated at the recipient, if the
        recipient is configured to generate such an Event, when a Send
        with Solicited Event or Send with Solicited Event and
        Invalidate Message is received.  Note: The Local Peer's ULP can
        use the Solicited Event mechanism to ensure that Messages
        designated as important to the ULP are handled in an
        expeditious manner by the Remote Peer's ULP. The ULP at the
        Local Peer can indicate a given Send Message Type is important
        by using the Send with Solicited Event Message or Send with
        Solicited Event and Invalidate Message. The ULP at the Remote
        Peer can choose to only be notified when valid Send with
        Solicited Event Messages and/or Send with Solicited Event and
        Invalidate Messages arrive and handle other valid incoming Send
        Messages or Send with Invalidate Messages at its leisure.
 
    Terminate - An RDMA Message used by a Node to pass an error
        indication to the peer Node on an RDMAP Stream. This operation
        is for RDMAP use only.
 
    ULP Buffer - A buffer owned above the RDMAP Layer and advertised to
        the RDMAP Layer either as a Tagged Buffer or an Untagged ULP
        Buffer.
 
    ULP Message - The ULP data that is handed to a specific protocol
        layer for transmission. Data boundaries are preserved as they
        are transmitted through iWARP.
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 21]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   3  ULP and Transport Attributes
 
 3.1  Transport Requirements & Assumptions
 
    RDMAP MUST be layered on top of the Direct Data Placement Protocol
    [DDP].
 
    RDMAP requires the following DDP support:
 
    *  RDMAP uses three queues for Untagged Buffers:
 
        *   Queue Number 0 (used by RDMAP for Send, Send with
            Invalidate, Send with Solicited Event, and Send with
            Solicited Event and Invalidate operations).
 
        *   Queue Number 1 (used by RDMAP for RDMA Read operations).
 
        *   Queue Number 2 (used by RDMAP for Terminate operations).
 
    *  DDP maps a single RDMA Message to a single DDP Message.
 
    *  DDP uses the STag and Tagged Offset provided by the RDMAP for
       Tagged Buffer Messages (i.e. RDMA Write and RDMA Read Response).
 
    *  When the DDP layer Delivers an Untagged DDP Message to the RDMAP
       layer, DDP provides the length of the DDP Message. This ensures
       that RDMAP does not have to carry a length field in its header.
 
    *  When the RDMAP layer provides an RDMA Message to the DDP Layer,
       DDP must insert the RsvdULP field value provided by the RDMAP
       Layer into the associated DDP Message.
 
    *  When the DDP layer Delivers a DDP Message to the RDMAP layer,
       DDP provides the RsvdULP field.
 
    *  The RsvdULP field must be 1 octet for DDP Tagged Messages and 5
       octets for DDP Untagged Messages.
 
    *  DDP propagates to RDMAP all operation or protection errors (used
       by RDMAP Terminate) and, when appropriate, the DDP Header fields
       of the DDP Segment that encountered the error.
 
 
 
 
                         Expires January, 2007               [Page 22]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  If an RDMA Operation is aborted by DDP or a lower layer, the
       contents of the Data Sink buffers associated with the operation
       are considered indeterminate.
 
    *  DDP in conjunction with the lower layers provide reliable, in-
       order Delivery.
 
 3.2  RDMAP Interactions with the ULP
 
    RDMAP provides the ULP with access to the following RDMA Operations
    as defined in this specification:
 
    *  Send
 
    *  Send with Solicited Event
 
    *  Send with Invalidate
 
    *  Send with Solicited Event and Invalidate
 
    *  RDMA Write
 
    *  RDMA Read
 
    For Send Operation Types, the following are the interactions
    between the RDMAP Layer and the ULP:
 
    *  At the Data Source:
 
        *   The ULP passes to the RDMAP Layer the following:
 
            *   ULP Message Length
 
            *   ULP Message
 
            *   An indication of the Send Operation Type, where the
                valid types are: Send, Send with Solicited Event, Send
                with Invalidate, or Send with Solicited Event and
                Invalidate.
 
            *   An Invalidate STag, if the Send Operation Type was Send
                with Invalidate or Send with Solicited Event and
                Invalidate.
 
 
                         Expires January, 2007               [Page 23]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        *   When the Send Operation Type Completes, an indication of
            the Completion results.
 
    *  At the Data Sink:
 
        *   If the Send Operation Type Completed successfully, the
            RDMAP Layer passes the following information to the ULP
            Layer:
 
            *   ULP Message Length
 
            *   ULP Message
 
            *   An Event, if the Data Sink is configured to generate an
                Event.
 
            *   An Invalidated STag, if the Send Operation Type was
                Send with Invalidate or Send with Solicited Event and
                Invalidate.
 
        *   If the Send Operation Type Completed in error, the Data
            Sink RDMAP Layer will pass up the corresponding error
            information to the Data Sink ULP and send a Terminate
            Message to the Data Source RDMAP Layer. The Data Source
            RDMAP Layer will then pass up the Terminate Message to the
            ULP.
 
    For RDMA Write Operations, the following are the interactions
    between the RDMAP Layer and the ULP:
 
    *  At the Data Source:
 
        *   The ULP passes to the RDMAP Layer the following:
 
            *   ULP Message Length
 
            *   ULP Message
 
            *   Data Sink STag
 
            *   Data Sink Tagged Offset
 
 
 
 
                         Expires January, 2007               [Page 24]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        *   When the RDMA Write Operation Completes, an indication of
            the Completion results.
 
    *  At the Data Sink:
 
        *   If the RDMA Write completed successfully, the RDMAP Layer
            does not Deliver the RDMA Write to the ULP. It does Place
            the ULP Message transferred through the RDMA Write Message
            into the ULP Buffer.
 
        *   If the RDMA Write completed in error, the Data Sink RDMAP
            Layer will pass up the corresponding error information to
            the Data Sink ULP and send a Terminate Message to the Data
            Source RDMAP Layer. The Data Source RDMAP Layer will then
            pass up the Terminate Message to the ULP.
 
    For RDMA Read Operations, the following are the interactions
    between the RDMAP Layer and the ULP:
 
    *  At the Data Sink:
 
        *   The ULP passes to the RDMAP Layer the following:
 
            *   ULP Message Length
 
            *   Data Source STag
 
            *   Data Sink STag
 
            *   Data Source Tagged Offset
 
            *   Data Sink Tagged Offset
 
        *   When the RDMA Read Operation Completes, an indication of
            the Completion results.
 
    *  At the Data Source:
 
        *   If no error occurred while processing the RDMA Read
            Request, the Data Source will not pass up any information
            to the ULP.
 
 
 
 
                         Expires January, 2007               [Page 25]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        *   If an error occurred while processing the RDMA Read
            Request, the Data Source RDMAP Layer will pass up the
            corresponding error information to the Data Source ULP and
            send a Terminate Message to the Data Sink RDMAP Layer. The
            Data Sink RDMAP Layer will then pass up the Terminate
            Message to the ULP.
 
    For STags made available to the RDMAP Layer, following are the
    interactions between the RDMAP Layer and the ULP:
 
    *  If the ULP enables an STag, the ULP passes to the RDMAP Layer
       the:
 
        *   yesSTag;
 
        *   range of Tagged Offsets that are associated with a given
            STag;
 
        *   remote access rights (read, write, or read and write)
            associated with a given, valid STag; and
 
        *   association between a given STag and a given RDMAP Stream.
 
    *  If the ULP disables an STag, the ULP passes to the RDMAP Layer
       the STag.
 
    If an error occurs at the RDMAP Layer, the RDMAP Layer may pass
    back error information (e.g. the content of a Terminate Message) to
    the ULP.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 26]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   4  Header Format
 
    The control information of RDMA Messages is included in DDP
    protocol defined header fields, with the following exceptions:
 
    *  The first octet reserved for ULP usage on all DDP Messages in
       the DDP Protocol (i.e. the RsvdULP Field) is used by RDMAP to
       carry the RDMA Message Opcode and the RDMAP version. This octet
       is known as the RDMAP Control Fiebld in this specification. For
       Send with Invalidate and Send with Solicited Event and
       Invalidate, RDMAP uses the second through fifth octets provided
       by DDP on Untagged DDP Messages to carry the STag that will be
       Invalidated.
 
    *  The RDMA Message length is passed by the RDMAP layer to the DDP
       layer on all outbound transfers.
 
    *  For RDMA Read Request Messages, the RDMA Read Message Size is
       included in the RDMA Read Request Header.
 
    *  The RDMA Message length is passed to the RDMAP Layer by the DDP
       layer on inbound Untagged Buffer transfers.
 
    *  Two RDMA Messages carry additional RDMAP headers. The RDMA Read
       Request carries the Data Sink and Data Source buffer
       descriptions, including buffer length. The Terminate carries
       additional information associated with the error that caused the
       Terminate.
 
 4.1  RDMAP Control and Invalidate STag Field
 
    The version of RDMAP defined by this specification uses all 8 bits
    of the RDMAP Control Field. The first octet reserved for ULP use in
    the DDP Protocol MUST be used by the RDMAP to carry the RDMAP
    Control Field. The ordering of the bits in the first octet MUST be
    as defined in Figure 3 DDP Control, RDMAP Control, and Invalidate
    STag Field. For Send with Invalidate and Send with Solicited Event
    and Invalidate, the second through fifth octets of the DDP RsvdULP
    field MUST be used by RDMAP to carry the Invalidate STag. Figure 3
    DDP Control, RDMAP Control, and Invalidate STag Field depicts the
    format of the DDP Control and RDMAP Control fields. (Note: In
    Figure 3 DDP Control, RDMAP Control, and Invalidate STag Field, the
    DDP Header is offset by 16 bits to accommodate the MPA header
 
 
                         Expires January, 2007               [Page 27]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    defined in [MPA]. The MPA header is only present if DDP is layered
    on top of MPA.)
 
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                    |T|L| Resrv | DV| RV|Rsv| Opcode|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                     Invalidate STag                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 3 DDP Control, RDMAP Control, and Invalidate STag Fields
 
    All RDMA Messages handed by the RDMAP Layer to the DDP layer MUST
    define the value of the Tagged flag in the DDP Header. Figure 4
    RDMA Usage of DDP Fields MUST be used to define the value of the
    Tagged flag that is handed to the DDP Layer for each RDMA Message.
 
    Figure 4 RDMA Usage of DDP Fields defines the value of the RDMA
    Opcode field that MUST be used for each RDMA Message.
 
    Figure 4 RDMA Usage of DDP Fields defines when the STag, Queue
    Number, and Tagged Offset fields MUST be provided for each RDMA
    Message.
 
    For this version of the RDMAP, all RDMA Messages MUST have:
 
    *  Bits 24-25; RDMA Version field: 01b for IETF RNICs, and 00b for
       RDMAC RNICs. Both version numbers are valid. Interoperability is
       dependent on MPA protocol version negotiation (e.g. MPA marker
       and MPA CRC).
 
    *  Bits 26-27; Reserved. MUST be set to zero by sender, ignored by
       the receiver.
 
    *  Bits 28-31; OpCode field: see Figure 4 RDMA Usage of DDP Fields.
 
    *  Bits 32-63; Invalidate STag. However, this field is only valid
       for Send with Invalidate and Send with Solicited Event and
       Invalidate Messages (see Figure 4 RDMA Usage of DDP Fields).
       For Send, Send with Solicited Event, RDMA Read Request, and
       Terminate, the Invalidate STag field MUST be set to zero on
       transmit and ignored by the receiver.
 
 
                         Expires January, 2007               [Page 28]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 -------+-----------+-------+------+-------+-----------+--------------
 RDMA   | Message   | Tagged| STag | Queue | Invalidate| Message
 Message| Type      | Flag  | and  | Number| STag      | Length
 OpCode |           |       | TO   |       |           | Communicated
        |           |       |      |       |           | between DDP
        |           |       |      |       |           | and RDMAP
 -------+-----------+-------+------+-------+-----------+--------------
 0000b  | RDMA Write| 1     | Valid| N/A   | N/A       | Yes
        |           |       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0001b  | RDMA Read | 0     | N/A  | 1     | N/A       | Yes
        | Request   |       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0010b  | RDMA Read | 1     | Valid| N/A   | N/A       | Yes
        | Response  |       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0011b  | Send      | 0     | N/A  | 0     | N/A       | Yes
        |           |       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0100b  | Send with | 0     | N/A  | 0     | Valid     | Yes
        | Invalidate|       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0101b  | Send with | 0     | N/A  | 0     | N/A       | Yes
        | SE        |       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0110b  | Send with | 0     | N/A  | 0     | Valid     | Yes
        | SE and    |       |      |       |           |
        | Invalidate|       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 0111b  | Terminate | 0     | N/A  | 2     | N/A       | Yes
        |           |       |      |       |           |
 -------+-----------+-------+------+-------+-----------+--------------
 1000b  |           |
 to     | Reserved  |               Not Specified
 1111b  |           |
 -------+-----------+-------------------------------------------------
    Figure 4 RDMA Usage of DDP Fields
 
    Note:  N/A means Not Applicable.
 
 
 
 
 
 
                         Expires January, 2007               [Page 29]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 4.2  RDMA Message Definitions
 
    The following figure defines which RDMA Headers MUST be used on
    each RDMA Message and which RDMA Messages are allowed to carry ULP
    payload:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 30]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 -------+-----------+-------------------+-------------------------
 RDMA   | Message   | RDMA Header Used  | ULP Message allowed in
 Message| Type      |                   | the RDMA Message
 OpCode |           |                   |
        |           |                   |
 -------+-----------+-------------------+-------------------------
 0000b  | RDMA Write| None              | Yes
        |           |                   |
 -------+-----------+-------------------+-------------------------
 0001b  | RDMA Read | RDMA Read Request | No
        | Request   | Header            |
 -------+-----------+-------------------+-------------------------
 0010b  | RDMA Read | None              | Yes
        | Response  |                   |
 -------+-----------+-------------------+-------------------------
 0011b  | Send      | None              | Yes
        |           |                   |
 -------+-----------+-------------------+-------------------------
 0100b  | Send with | None              | Yes
        | Invalidate|                   |
 -------+-----------+-------------------+-------------------------
 0101b  | Send with | None              | Yes
        | SE        |                   |
 -------+-----------+-------------------+-------------------------
 0110b  | Send with | None              | Yes
        | SE and    |                   |
        | Invalidate|                   |
 -------+-----------+-------------------+-------------------------
 0111b  | Terminate | Terminate Header  | No
        |           |                   |
 -------+-----------+-------------------+-------------------------
 1000b  |           |
 to     | Reserved  |            Not Specified
 1111b  |           |
 -------+-----------+-------------------+-------------------------
    Figure 5 RDMA Message Definitions
 
 4.3  RDMA Write Header
 
    The RDMA Write Message does not include an RDMAP header. The RDMAP
    layer passes to the DDP layer an RDMAP Control Field. The RDMA
    Write Message is fully described by the DDP Headers of the DDP
    Segments associated with the Message.
 
 
                         Expires January, 2007               [Page 31]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    See section 11 Appendix for a description of the DDP Segment format
    associated with RDMA Write Messages.
 
 4.4  RDMA Read Request Header
 
    The RDMA Read Request Message carries an RDMA Read Request Header
    that describes the Data Sink and Data Source Buffers used by the
    RDMA Read operation. The RDMA Read Request Header immediately
    follows the DDP header. The RDMAP layer passes to the DDP layer an
    RDMAP Control Field. The following figure depicts the RDMA Read
    Request Header that MUST be used for all RDMA Read Request
    Messages:
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Data Sink STag (SinkSTag)                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     +                  Data Sink Tagged Offset (SinkTO)             +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                  RDMA Read Message Size (RDMARDSZ)            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Data Source STag (SrcSTag)                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     +                 Data Source Tagged Offset (SrcTO)             +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 6 RDMA Read Request Header Format
 
    Data Sink Steering Tag: 32 bits.
 
         The Data Sink Steering Tag identifies the Data Sink's Tagged
         Buffer. This field MUST be copied, without interpretation,
         from the RDMA Read Request into the corresponding RDMA Read
         Response and allows the Data Sink to place the returning data.
         The STag is associated with the RDMAP Stream through a
         mechanism that is outside the scope of the RDMAP
         specification.
 
    Data Sink Tagged Offset: 64 bits.
 
 
                         Expires January, 2007               [Page 32]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
         The Data Sink Tagged Offset specifies the starting offset, in
         octets, from the base of the Data Sink's Tagged Buffer, where
         the data is to be written by the Data Source. This field is
         copied from the RDMA Read Request into the corresponding RDMA
         Read Response and allows the Data Sink to place the returning
         data. The Data Sink Tagged Offset MAY start at an arbitrary
         offset.
 
         The Data Sink STag and Data Sink Tagged Offset fields describe
         the buffer to which the RDMA Read data is written.
 
         Note: the DDP Layer protects against a wrap of the Data Sink
         Tagged Offset.
 
    RDMA Read Message Size: 32 bits.
 
         The RDMA Read Message Size is the amount of data, in octets,
         read from the Data Source. A single RDMA Read Request Message
         can retrieve from 0 to 2^32-1 data octets from the Data
         Source.
 
    Data Source Steering Tag: 32 bits.
 
         The Data Source Steering Tag identifies the Data Source's
         Tagged Buffer. The STag is associated with the RDMAP Stream
         through a mechanism that is outside the scope of the RDMAP
         specification.
 
    Data Source Tagged Offset: 64 bits.
 
         The Tagged Offset specifies the starting offset, in octets,
         that is to be read from the Data Source's Tagged Buffer. The
         Data Source Tagged Offset MAY start at an arbitrary offset.
 
         The Data Source STag and Data Source Tagged Offset fields
         describe the buffer from which the RDMA Read data is read.
 
    See Section 7.2 Errors Detected at the Remote Peer on Incoming RDMA
    Messages for a description of error checking required upon
    processing of an RDMA Read Request at the Data Source.
 
 
 
 
 
                         Expires January, 2007               [Page 33]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 4.5  RDMA Read Response Header
 
    The RDMA Read Response Message does not include an RDMAP header.
    The RDMAP layer passes to the DDP layer an RDMAP Control Field. The
    RDMA Read Response Message is fully described by the DDP Headers of
    the DDP Segments associated with the Message.
 
    See Section 11 Appendix for a description of the DDP Segment format
    associated with RDMA Read Response Messages.
 
 4.6  Send Header and Send with Solicited Event Header
 
    The Send and Send with Solicited Event Message do not include an
    RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP
    Control Field. The Send and Send with Solicited Event Message are
    fully described by the DDP Headers of the DDP Segments associated
    with the Message.
 
    See Section 11 Appendix for a description of the DDP Segment format
    associated with Send and Send with Solicited Event Messages.
 
 4.7 Send with Invalidate Header and Send with SE and Invalidate Header
 
    The Send with Invalidate and Send with Solicited Event and
    Invalidate Message do not include an RDMAP header. The RDMAP layer
    passes to the DDP layer an RDMAP Control Field and the Invalidate
    STag field (see section 4.1 RDMAP Control and Invalidate STag
    Field). The Send with Invalidate and Send with Solicited Event and
    Invalidate Message are fully described by the DDP Headers of the
    DDP Segments associated with the Message.
 
    See Section 11 Appendix for a description of the DDP Segment format
    associated with Send and Send with Solicited Event Messages.
 
 4.8  Terminate Header
 
    The Terminate Message carries a Terminate Header that contains
    additional information associated with the cause of the Terminate.
    The Terminate Header immediately follows the DDP header. The RDMAP
    layer passes to the DDP layer an RDMAP Control Field. The following
    figure depicts a Terminate Header that MUST be used for the
    Terminate Message:
 
 
 
                         Expires January, 2007               [Page 34]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       Terminate Control             |      Reserved           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  DDP Segment Length  (if any) |                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
     |                                                               |
     //                                                             //
     |                  Terminated DDP Header (if any)               |
     +                                                               +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     //                                                             //
     |                 Terminated RDMA Header (if any)               |
     +                                                               +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 7 Terminate Header Format
 
 
 
    Terminate Control: 19 bits.
 
        The Terminate Control field MUST have the format defined in
        Figure 8 Terminate Control Field.
 
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Layer | EType |   Error Code  |HdrCt|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 8 Terminate Control Field
 
        *   Figure 9 Terminate Control Field Values defines the valid
            values that MUST be used for this field.
 
            *   Layer: 4 bits.
 
                Identifies the layer that encountered the error.
 
 
 
                         Expires January, 2007               [Page 35]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
            *   EType (RDMA Error Type): 4 bits.
 
                Identifies the type of error that caused the Terminate.
                When the error is detected at the RDMAP Layer, the
                RDMAP Layer inserts the Error Type into this field.
                When the error is detected at a LLP layer, a LLP layer
                creates the Error Type and the DDP layer passes it up
                to the RDMAP Layer, and the RDMAP Layer inserts it into
                this field.
 
            *   Error Code: 8 bits.
 
                This field identifies the specific error that caused
                the Terminate. When the error is detected at the RDMAP
                Layer, the RDMAP Layer creates the Error Code. When the
                error is detected at a LLP layer, a LLP layer creates
                the Error Code and the DDP layer passes it up to the
                RDMAP Layer, and the RDMAP Layer inserts it into this
                field.
 
            *   HdrCt: 3 bits.
 
                Header control bits:
 
                *   M: bit 16. DDP Segment Length valid. See Figure 10
                    for when this bit SHOULD be set.
 
                *   D: bit 17. DDP Header Included. See Figure 10 for
                    when this bit SHOULD be set.
 
                *   R: bit 18. RDMAP Header Included. See Figure 10 for
                    when this bit SHOULD be set.
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 36]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 -------+----------+-------+-------------+------+--------------------
 Layer  | Layer    | Error | Error Type  | Error| Error Code Name
        | Name     | Type  | Name        | Code |
 -------+----------+-------+-------------+------+--------------------
        |          | 0000b | Local       | None | None
        |          |       | Catastrophic|      |
        |          |       | Error       |      |
        |          +-------+-------------+------+--------------------
        |          |       |             | 00X  | Invalid STag
        |          |       |             +------+--------------------
        |          |       |             | 01X  | Base or bounds
        |          |       |             |      | violation
        |          |       | Remote      +------+--------------------
        |          | 0001b | Protection  | 02X  | Access rights
        |          |       | Error       |      | violation
        |          |       |             +------+--------------------
 0000b  | RDMA     |       |             | 03X  | STag not associated
        |          |       |             |      | with RDMAP Stream
        |          |       |             +------+--------------------
        |          |       |             | 04X  | TO wrap
        |          |       |             +------+--------------------
        |          |       |             | 09X  | STag cannot be
        |          |       |             |      | Invalidated
        |          |       |             +------+--------------------
        |          |       |             | FFX  | Unspecified Error
        |          +-------+-------------+------+--------------------
        |          |       |             | 05X  | Invalid RDMAP
        |          |       |             |      | version
        |          |       |             +------+--------------------
        |          |       |             | 06X  | Unexpected OpCode
        |          |       | Remote      +------+--------------------
        |          | 0010b | Operation   | 07X  | Catastrophic error,
        |          |       | Error       |      | localized to RDMAP
        |          |       |             |      | Stream
        |          |       |             +------+--------------------
        |          |       |             | 08X  | Catastrophic error,
        |          |       |             |      | global
        |          |       |             +------+--------------------
        |          |       |             | 09X  | STag cannot be
        |          |       |             |      | Invalidated
        |          |       |             +------+--------------------
        |          |       |             | FFX  | Unspecified Error
 -------+----------+-------+-------------+------+--------------------
 
 
                         Expires January, 2007               [Page 37]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 0001b  | DDP      | See DDP Specification [DDP] for a description of
        |          | the values and names.
 -------+----------+-------+-----------------------------------------
 0010b  | LLP      | For MPA, see MPA Specification [MPA] for a
        | (eg MPA) | description of the values and names.
 -------+----------+-------+-----------------------------------------
    Figure 9 Terminate Control Field Values
 
    Reserved: 13 bits. This field MUST be set to zero on transmit,
    ignored on receive.
 
    DDP Segment Length: 16 bits
 
         The length handed up by the DDP Layer when the error was
         detected. It MUST be valid if the M bit is set. It MUST be
         present when the D bit is set.
 
    Terminated DDP Header: 112 bits for Tagged Messages and 144 bits
    for Untagged Messages.
 
         The DDP Header of the incoming Message that is associated with
         the Terminate. The DDP Header is not present if the Terminate
         Error Type is a Local Catastrophic Error. It MUST be present
         if the D bit is set.
 
    Terminated RDMA Header: 224 bits.
 
         The Terminated RDMA Header is only sent back if the terminate
         is associated with an RDMA Read Request Message. It MUST be
         present if the R bit is set.
 
         If the terminate occurs before the first RDMA Read Request
         byte is processed, the original RDMA Read Request Header is
         sent back.
 
         If the terminate occurs after the first RDMA Read Request byte
         is processed, the RDMA Read Request Header is updated to
         reflect the current location of the RDMA Read operation that
         is in process:
 
            *   Data Sink STag = Data Sink STag originally sent in the
                RDMA Read Request.
 
 
 
                         Expires January, 2007               [Page 38]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
            *   Data Sink Tagged Offset = Current offset into the Data
                Sink Tagged Buffer. For example if the RDMA Read
                Request was terminated after 2048 octets were sent,
                then the Data Sink Tagged Offset = the original Data
                Sink Tagged Offset + 2048.
 
            *   Data Message size = Number of bytes left to transfer.
 
            *   Data Source STag = Data Source STag in the RDMA Read
                Request.
 
            *   Data Source Tagged Offset = Current offset into the
                Data Source Tagged Buffer. For example if the RDMA Read
                Request was terminated after 2048 octets were sent,
                then the Data Source Tagged Offset = the original Data
                Source Tagged Offset + 2048.
 
    Note: if a given LLP does not define any termination codes for the
    RDMAP Termination message to use, then none would be used for that
    LLP.
 
    Figure 10 Error Type to RDMA Message Mapping maps layer name and
    error types to each RDMA Message type:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 39]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 ---------+-------------+------------+------------+-----------------
 Layer    | Error Type  | Terminate  | Terminate  | What type of
 Name     | Name        | Includes   | Includes   | RDMA Message can
          |             | DDP Header | RDMA Header| cause the error
          |             | and DDP    |            |
          |             | Segment    |            |
          |             | Length     |            |
 ---------+-------------+------------+------------+-----------------
          | Local       | No         | No         | Any
          | Catastrophic|            |            |
          | Error       |            |            |
          +-------------+------------+------------+-----------------
          | Remote      | Yes, if    | Yes        | Only RDMA Read
 RDMA     | Protection  | possible   |            | Request, Send
          | Error       |            |            | with Invalidate,
          |             |            |            | and Send with SE
          |             |            |            | and Invalidate
          +-------------+------------+------------+-----------------
          | Remote      | Yes, if    | No         | Any
          | Operation   | possible   |            |
          | Error       |            |            |
 ---------+-------------+------------+------------+-----------------
 DDP      | See DDP Spec| Yes        | No         | Any
          | [DDP]       |            |            |
 ---------+-------------+------------+------------+-----------------
 LLP      | See LLP Spec| No         | No         | Any
          | [e.g. MPA]  |            |            |
    Figure 10 Error Type to RDMA Message Mapping
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 40]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   5  Data Transfer
 
 5.1  RDMA Write Message
 
    An RDMA Write is used by the Data Source to transfer data to a
    previously Advertised Tagged Buffer at the Data Sink. The RDMA
    Write Message has the following semantics:
 
    *  An RDMA Write Message MUST reference a Tagged Buffer. That is,
       the Data Source RDMAP Layer MUST request that the DDP layer mark
       the Message as Tagged.
 
    *  A valid RDMA Write Message MUST NOT be delivered to the Data
       Sink's ULP (i.e. it is placed by the DDP layer).
 
    *  At the Remote Peer, when an invalid RDMA Write Message is
       delivered to the Remote Peer's RDMAP Layer, an error is surfaced
       (see section 7.1 RDMAP Error Surfacing).
 
    *  The Tagged Offset of a Tagged Buffer MAY start at a non-zero
       value.
 
    *  An RDMA Write Message MAY target all or part of a previously
       Advertised buffer.
 
    *  The RDMAP does not define how the buffer(s) used by an outbound
       RDMA Write is defined and how it is addressed. For example, an
       implementation of RDMA may choose to allow a gather-list of non-
       contiguous data blocks to be the source of an RDMA Write. In
       this case, the data blocks would be combined by the Data Source
       and sent as a single RDMA Write Message to the Data Sink.
 
    *  The Data Source RDMAP Layer MUST issue RDMA Write Messages to
       the DDP layer in the order they were submitted by the ULP.
 
    *  At the Data Source, a subsequent Send (Send with Invalidate,
       Send with Solicited Event, or Send with Solicited Event and
       Invalidate) Message MAY be used to signal Delivery of previous
       RDMA Write Messages to the Data Sink, if desired by the ULP.
 
    *  If the Local Peer wishes to write to multiple Tagged Buffers on
       the Remote Peer, the Local Peer MUST use multiple RDMA Write
 
 
 
                         Expires January, 2007               [Page 41]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
       Messages. That is, a single RDMA Write Message can only write to
       one remote Tagged Buffer.
 
    *  The Data Source MAY issue a zero length RDMA Write Message.
 
 
 
 5.2  RDMA Read Operation
 
    The RDMA Read operation MUST consist of a single RDMA Read Request
    Message and a single RDMA Read Response Message.
 
 5.2.1  RDMA Read Request Message
 
    An RDMA Read Request is used by the Data Sink to transfer data from
    a previously Advertised Tagged Buffer at the Data Source to a
    Tagged Buffer at the Data Sink. The RDMA Read Request Message has
    the following semantics:
 
    *  An RDMA Read Request Message MUST reference an Untagged Buffer.
       That is, the Local Peer's RDMAP Layer MUST request that the DDP
       mark the Message as Untagged.
 
    *  One RDMA Read Request Message MUST consume one Untagged Buffer.
 
    *  The Remote Peer's RDMAP Layer MUST process an RDMA Read Request
       Message. A valid RDMA Read Request Message MUST NOT be delivered
       to the Data Sink's ULP (i.e. it is processed by the RDMAP
       layer).
 
    *  At the Remote Peer, when an invalid RDMA Read Request Message is
       delivered to the Remote Peer's RDMAP Layer, an error is surfaced
       (see section 7.1 RDMAP Error Surfacing).
 
    *  AN RDMA Read Request Message MUST reference the RDMA Read
       Request Queue. That is, the Local Peer's RDMAP Layer MUST
       request that the DDP layer set the Queue Number field to one.
 
    *  The Local Peer MUST pass to the DDP Layer RDMA Read Request
       Messages in the order they were submitted by the ULP.
 
    *  The Remote Peer MUST process the RDMA Read Request Messages in
       the order they were sent.
 
 
                         Expires January, 2007               [Page 42]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  If the Local Peer wishes to read from multiple Tagged Buffers on
       the Remote Peer, the Local Peer MUST use multiple RDMA Read
       Request Messages. That is, a single RDMA Read Request Message
       MUST only read from one remote Tagged Buffer.
 
    *  AN RDMA Read Request Message MAY target all or part of a
       previously Advertised buffer.
 
    *  If the Data Source receives a valid RDMA Read Request Message it
       MUST respond with a valid RDMA Read Response Message.
 
    *  The Data Sink MAY issue a zero length RDMA Read Request Message,
       by setting the RDMA Read Message Size field to zero in the RDMA
       Read Request Header.
 
    *  If the Data Source receives a non-zero length RDMA Read Message
       Size, the Data Source RDMAP MUST validate the Data Source STag
       and Data Source Tagged Offset contained in the RDMA Read Request
       Header.
 
    *  If the Data Source receives an RDMA Read Request Header with the
       RDMA Read Message Size set to zero, the Data Source RDMAP:
 
        *   MUST NOT validate the Data Source STag and Data Source
            Tagged Offset contained in the RDMA Read Request Header,
            and
 
        *   MUST respond with a zero length RDMA Read Response Message.
 
 5.2.2  RDMA Read Response Message
 
    The RDMA Read Response Message uses the DDP Tagged Buffer Model to
    Deliver the contents of a previously requested Data Source Tagged
    Buffer to the Data Sink, without any involvement from the ULP at
    the Remote Peer. The RDMA Read Response Message has the following
    semantics:
 
    *  The RDMA Read Response Message for the associated RDMA Read
       Request Message travels in the opposite direction.
 
    *  An RDMA Read Response Message MUST reference a Tagged Buffer.
       That is, the Data Source RDMAP Layer MUST request that the DDP
       mark the Message as Tagged.
 
 
                         Expires January, 2007               [Page 43]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  The Data Source MUST ensure that a sufficient number of Untagged
       Buffers are available on the RDMA Read Request Queue (Queue with
       DDP Queue Number 1) to support the maximum number of RDMA Read
       Requests negotiated by the ULP.
 
    *  The RDMAP Layer MUST Deliver the RDMA Read Response Message to
       the ULP.
 
    *  At the Remote Peer, when an invalid RDMA Read Response Message
       is delivered to the Remote Peer's RDMAP Layer, an error is
       surfaced (see section 7.1 RDMAP Error Surfacing).
 
    *  The Tagged Offset of a Tagged Buffer MAY start at a non-zero
       value.
 
    *  The Data Source RDMAP Layer MUST pass RDMA Read Response
       Messages to the DDP layer in the order that the RDMA Read
       Request Messages were received by the RDMAP Layer at the Data
       Source.
 
    *  The Data Sink MAY validate that the STag, Tagged Offset, and
       length of the RDMA Read Response Message are the same as the
       STag, Tagged Offset, and length included in the corresponding
       RDMA Read Request Message.
 
    *  A single RDMA Read Response Message MUST write to one remote
       Tagged Buffer. If the Data Sink wishes to Read multiple Tagged
       Buffers, the Data Sink can use multiple RDMA Read Request
       Messages.
 
 5.3  Send Message Type
 
    The Send Message Type uses the DDP Untagged Buffer Model to
    transfer data from the Data Source into an Untagged Buffer at the
    Data Sink.
 
    *  A Send Message Type MUST reference an Untagged Buffer. That is,
       the Local Peer's RDMAP Layer MUST request that the DDP layer
       mark the Message as Untagged.
 
    *  One Send Message Type MUST consume one Untagged Buffer.
 
 
 
 
                         Expires January, 2007               [Page 44]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        *   The ULP Message sent using a Send Message Type MAY be less
            than or equal to the size of the consumed Untagged Buffer.
            The RDMAP Layer communicates to the ULP the size of the
            data written into the Untagged Buffer.
 
        *   If the ULP Message sent via Send Message Type is larger
            than the Data Sink's Untagged Buffer, it is an error (see
            section 9.1 RDMAP Error Surfacing).
 
    *  At the Remote Peer, the Send Message Type MUST be Delivered to
       the Remote Peer's ULP in the order they were sent.
 
    *  After the Send with Solicited Event or Send with Solicited Event
       and Invalidate Message is Delivered to the ULP, the RDMAP MAY
       generate an Event, if the Data Sink is configured to generate
       such an Event.
 
    *  At the Remote Peer, when an invalid Send Message Type is
       Delivered to the Remote Peer's RDMAP Layer, an error is surfaced
       (see section 7.1 RDMAP Error Surfacing).
 
    *  The RDMAP does not define how the buffer(s) used by an outbound
       Send Message Type is defined and how it is addressed. For
       example, an implementation of RDMA may choose to allow a gather-
       list of non-contiguous data blocks to be the source of a Send
       Message Type. In this case, the data blocks would be combined by
       the Data Source and sent as a single Send Message Type to the
       Data Sink.
 
    *  For a Send Message Type, the Local Peer's RDMAP Layer MUST
       request that the DDP layer set the Queue Number field to zero.
 
    *  The Local Peer MUST issue Send Message Type Messages in the
       order they were submitted by the ULP.
 
    *  The Data Source MAY pass a zero length Send Message Type. A zero
       length Send Message Type MUST consume an Untagged Buffer at the
       Data Sink. A Send with Invalidate or Send with Solicited Event
       and Invalidate Message MUST reference an STag. That is, the
       Local Peer's RDMAP Layer MUST pass the RDMA control field and
       the STag that will be Invalidated to the DDP layer.
 
 
 
 
                         Expires January, 2007               [Page 45]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  When the Send with Invalidate and Send with Solicited Event and
       Invalidate Message are Delivered to the Remote Peer's RDMAP
       Layer, the RDMAP Layer MUST:
 
        *   Verify the STag that is associated with the RDMAP Stream;
            and
 
        *   Invalidate the STag if it is associated with the RDMAP
            Stream; or Issue a Terminate Message with the STag Cannot
            be Invalidated Terminate Error Code, if the STag is not
            associated with the RDMAP Stream.
 
 5.4  Terminate Message
 
    The Terminate Message uses the DDP Untagged Buffer Model to
    transfer error related information from the Data Source into an
    Untagged Buffer at the Data Sink and then ceases all further
    communications on the underlying DDP Stream. The Terminate Message
    has the following semantics:
 
    *  A Terminate Message MUST reference an Untagged Buffer. That is,
       the Local Peer's RDMAP Layer MUST request that the DDP layer
       mark the Message as Untagged.
 
    *  A Terminate Message references the Terminate Queue. That is, the
       Local Peer's RDMAP Layer MUST request that the DDP layer set the
       Queue Number field to two.
 
    *  One Terminate Message MUST consume one Untagged Buffer.
 
    *  On a single RDMAP Stream, the RDMAP layer MUST guarantee
       placement of a single Terminate Message.
 
    *  A Terminate Message MUST be Delivered to the Remote Peer's RDMAP
       Layer. The RDMAP Layer MUST Deliver the Terminate Message to the
       ULP.
 
    *  At the Remote Peer, when an invalid Terminate Message is
       delivered to the Remote Peer's RDMAP Layer, an error is surfaced
       (see section 7.1 RDMAP Error Surfacing).
 
    *  The RDMAP Layer Completes in error all ULP Operations that have
       not been provided to the DDP layer.
 
 
                         Expires January, 2007               [Page 46]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    *  After sending a Terminate Message on an RDMAP Stream, the Local
       Peer MUST NOT send any more Messages on that specific RDMAP
       Stream.
 
    *  After receiving a Terminate Message on an RDMAP Stream, the
       Remote Peer MAY stop sending Messages on that specific RDMAP
       Stream.
 
 5.5  Ordering and Completions
 
    It is important to understand the difference between Placement and
    Delivery ordering since RDMAP provides quite different semantics
    for the two.
 
    Note that many current protocols, both as used in the Internet and
    elsewhere, assume that data is both Placed and Delivered in order.
    This allowed applications to take a variety of shortcuts by taking
    advantage of this fact.  For RDMAP, many of these shortcuts are no
    longer safe to use, and could cause application failure.
 
    The following rules apply to implementations of the RDMAP protocol.
    Note, in these rules Send includes Send, Send with Invalidate, Send
    with Solicited Event, and Send with Solicited Event and Invalidate:
 
    1.  RDMAP does not provide ordering among Messages on different
        RDMAP Streams.
 
    2.  RDMAP does not provide ordering between operations that are
        generated from the two ends of an RDMAP Stream.
 
    3.  RDMA Messages that use Tagged and Untagged Buffers MAY be
        Placed in any order.  If an application uses overlapping
        buffers (points different Messages or portions of a single
        Message at the same buffer), then it is possible that the last
        incoming write to the Data Sink buffer will not be the last
        outgoing data sent from the Data Source.
 
    4.  For a Send operation, the contents of an Untagged Buffer at the
        Data Sink MAY be indeterminate until the Send is Delivered to
        the ULP at the Data Sink.
 
 
 
 
 
                         Expires January, 2007               [Page 47]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    5.  For an RDMA Write operation, the contents of the Tagged Buffer
        at the Data Sink MAY be indeterminate until a subsequent Send
        is Delivered to the ULP at the Data Sink.
 
    6.  For an RDMA Read operation, the contents of the Tagged Buffer
        at the Data Sink MAY be indeterminate until the RDMA Read
        Response Message has been Delivered at the Local Peer.
 
         Statements 4, 5, and 6 imply "no peeking" at the data to see
         if it is done.  It is possible for some data to arrive before
         logically earlier data does, and peeking may cause
         unpredictable application failure
 
    7.  If the ULP or Application modifies the contents of Tagged or
        Untagged Buffers being modified by an RDMA Operation while the
        RDMAP is processing the RDMA Operation, the state of the
        Buffers is indeterminate.
 
    8.  If the ULP or Application modifies the contents of Tagged or
        Untagged Buffers read by an RDMA Operation while the RDMAP is
        processing the RDMA Operation, the results of the read are
        indeterminate.
 
    9.  The Completion of an RDMA Write or Send Operation at the Local
        Peer does not guarantee that the ULP Message has yet reached
        the Remote Peer ULP Buffer or been examined by the Remote ULP.
 
    10. Send Messages MUST be Delivered to the ULP at the Remote Peer
        after they are Delivered to RDMAP by DDP and in the order that
        the they were Delivered to RDMAP.
 
        Note that DDP ordering rules ensure that this will be the same
        order that they were submitted at the Local Peer and that any
        prior RDMA Writes have been submitted for ordered Placement at
        the Remote Peer. This means that when the ULP sees the Delivery
        of the Send, the memory buffers targeted by any preceding RDMA
        Writes and Sends are available to be accessed locally or
        remotely as authorized. If the ULP overlaps its buffers for
        different operations, the data from the RDMA Write or Send may
        be overwritten by subsequent RDMA Operations before the ULP
        receives and processes the Delivery.
 
 
 
 
                         Expires January, 2007               [Page 48]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    11. RDMA Read Response Messages MUST be Delivered to the ULP at the
        Remote Peer after they are Delivered to RDMAP by DDP and in the
        order that the they were Delivered to RDMAP.
 
        DDP ordering rules ensure that this will be the same order that
        they were submitted at the Local Peer. This means that when the
        ULP sees the Delivery of the RDMA Read Response, the memory
        buffers targeted by the RDMA Read Response are available to be
        accessed locally or remotely as authorized. If the ULP overlaps
        its buffers for different operations, the data from the RDMA
        Read Response may be overwritten by subsequent RDMA Operations
        before the ULP receives and processes the Delivery.
 
    12. RDMA Read Request Messages, including zero-length RDMA Read
        Requests, MUST NOT start processing at the Remote Peer until
        they have been Delivered to RDMAP by DDP.
 
        Note: the ULP is assured that data written can be read back.
        For example, if an RDMA Read Request is issued by the local
        peer, targeting the same ULP Buffer as a preceding Send or RDMA
        Write (in the same direction as the RDMA Read Request), and
        there are no other sources of update for the ULP Buffer, then
        the remote peer will send back the data written by the Send or
        RDMA Write. That is, for this example the ULP Buffer: is
        Advertised for use on a series of RDMA Messages, is only valid
        on the RDMAP Stream for which it is advertised, and is not
        locally updated while the series of RDMAP Messages are
        performed. For this example, order rule (12) assures that
        subsequent local or remote accesses to the ULP Buffer contain
        the data written by the Send or RDMA Write.
 
        RDMA Read Response Messages MAY be generated at the Remote Peer
        after subsequent RDMA Write Messages or Send Messages have been
        Placed or Delivered. Therefore, when an application does an
        RDMA Read Request followed by an RDMA Write (or Send) to the
        same buffer, it may get the data from the later RDMA Write (or
        Send) in the RDMA Read Response Message, even though the
        operations completed in order at the Local Peer.  If this
        behavior is not desired, the Local Peer ULP must Fence the
        later RDMA write (or Send) by withholding the RDMA Write
        Message until all outstanding RDMA Read Responses have been
        Delivered.
 
 
 
                         Expires January, 2007               [Page 49]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    13. The RDMAP Layer MUST submit RDMA Messages to the DDP layer in
        the order the RDMA Operations are submitted to the RDMAP Layer
        by the ULP.
 
    14. A Send or RDMA Write Message MUST NOT be considered Complete at
        the Local Peer (Data Source) until it has been successfully
        completed at the DDP layer.
 
    15. RDMA Operations MUST be Completed at the Local Peer in the
        order that they were submitted by the ULP.
 
    16. At the Data Sink, an incoming Send Message MUST be Delivered to
        the ULP only after the DDP Message has been Delivered to the
        RDMAP Layer by the DDP layer.
 
    17. RDMA Read Response Message processing at the Remote Peer
        (reading the specified Tagged Buffer) MUST be started only
        after the RDMA Read Request Message has been Delivered by the
        DDP layer (thus all previous RDMA Messages have been properly
        submitted for ordered Placement).
 
    18. Send Messages MAY be Completed at the Remote Peer (Data Sink)
        before prior incoming RDMA Read Request Messages have completed
        their response processing.
 
    19. An RDMA Read operation MUST NOT be Completed at the Local Peer
        until the DDP layer Delivers the associated incoming RDMA Read
        Response Message.
 
    20. If more than one outstanding RDMA Read Request Message is
        supported by both peers, the RDMA Read Response Messages MUST
        be submitted to the DDP layer on the Remote Peer in the order
        the RDMA Read Request Messages were Delivered by DDP, but the
        actual read of the buffer contents MAY take place in any order
        at the Remote Peer.
 
         This simplifies Local Peer Completion processing for RDMA
         Reads in that a Delivered RDMA Read Response MUST be
         sufficient to Complete the RDMA Read Operation.
 
 
 
 
 
 
                         Expires January, 2007               [Page 50]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   6  RDMAP Stream Management
 
    RDMAP Stream management consists of RDMAP Stream Initialization and
    RDMAP Stream Termination.
 
 6.1  Stream Initialization
 
    RDMAP Stream initialization occurs after the LLP Stream has been
    created (e.g. for DDP/MPA over TCP the first TCP Segment after the
    SYN, SYN/ACK exchange). The ULP is responsible for transitioning
    the LLP Stream into RDMA enabled mode. The switch to RDMA mode
    typically occurs sometime after LLP Stream setup. Once in RDMA
    enabled mode, an implementation MUST send only RDMA Messages across
    the transport Stream until the RDMAP Stream is torn down.
 
    For each direction of an RDMAP Stream:
 
    *  For a given RDMAP Stream, the number of outstanding RDMA Read
       Requests is limited per RDMAP Stream direction.
 
    *  It is the ULP's responsibility to set the maximum number of
       outstanding, inbound RDMA Read Requests per RDMAP Stream
       direction.
 
    *  The RDMAP Layer MUST provide the maximum number of outstanding,
       inbound RDMA Read Requests per RDMAP Stream direction that were
       negotiated between the ULP and the Local Peer's RDMAP Layer. The
       negotiation mechanism is outside the scope of this
       specification.
 
    *  It is the ULP's responsibility to set the maximum number of
       outstanding, outbound RDMA Read Requests per RDMAP Stream
       direction.
 
    *  The RDMAP Layer MUST provide the maximum number of outstanding,
       outbound RDMA Read Requests for the RDMAP Stream direction that
       were negotiated between the ULP and the Local Peer's RDMAP
       Layer. The negotiation mechanism is outside the scope of this
       specification.
 
    *  The Local Peer's ULP is responsible for negotiating with the
       Remote Peer's ULP the maximum number of outstanding RDMA Read
       Requests for the RDMAP Stream direction. It is recommended that
 
 
                         Expires January, 2007               [Page 51]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
       the ULP set the maximum number of outstanding, inbound RDMA Read
       Requests equal to the maximum number of outstanding, outbound
       RDMA Read Requests for a given RDMAP Stream direction.
 
    *  For outbound RDMA Read Requests, the RDMAP Layer MUST NOT exceed
       the maximum number of outstanding, outbound RDMA Read Requests
       that were negotiated between the ULP and the Local Peer's RDMAP
       Layer.
 
    *  For inbound RDMA Read Requests, the RDMAP Layer MUST NOT exceed
       the maximum number of outstanding, inbound RDMA Read Requests
       that were negotiated between the ULP and the Local Peer's RDMAP
       Layer.
 
 
 
 6.2  Stream Teardown
 
    There are three methods for terminating an RDMAP Stream: ULP
    Graceful Termination, RDMAP Abortive Termination, and LLP Abortive
    Termination.
 
    The ULP is responsible for performing ULP Graceful Termination.
    After a ULP Graceful Termination, either side of the Stream can
    initiate LLP Graceful Termination, using the graceful termination
    mechanism provided by the LLP.
 
    RDMAP Abortive Termination allows the RDMAP to issue a Terminate
    Message describing the reason the RDMAP Stream was terminated. The
    next section (6.2.1 RDMAP Abortive Termination) describes the RDMAP
    Abortive Termination in detail.
 
    LLP Abortive Termination results due to a LLP error and causes the
    RDMAP Stream to be torn down midstream, without an RDMAP Terminate
    Message.  While this last method is highly undesirable, it is
    possible and the ULP should take this into consideration.
 
 6.2.1  RDMAP Abortive Termination
 
    RDMAP defines a Terminate operation that SHOULD be invoked when
    either an RDMAP error is encountered or a LLP error is surfaced to
    the RDMAP layer by the LLP.
 
 
 
                         Expires January, 2007               [Page 52]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    It is not always possible to send the Terminate Message. For
    example, certain LLP errors may occur that cause the LLP Stream to
    be torn down before a) RDMAP is aware of the error, b) before RDMAP
    is able to send the Terminate Message, or c) after RDMAP has posted
    the Terminate Message to the LLP, but it has not yet been
    transmitted by the LLP.
 
    Note that an RDMAP Abortive Termination may entail loss of data. In
    general, when a Terminate Message is received it is impossible to
    tell for sure what unacknowledged RDMA Messages were Completed
    successfully at the Remote Peer. Thus the state of all outstanding
    RDMA Messages is indeterminate and the Messages SHOULD be
    considered Completed in error.
 
    When a peer sends or receives a Terminate Message, it MAY
    immediately teardown the LLP Stream. The peer SHOULD perform a
    graceful LLP teardown to ensure the Terminate Message is
    successfully Delivered.
 
    See section 4.8 Terminate Header for a description of the Terminate
    Message and its contents. See section 5.4 Terminate Message for a
    description of the Terminate Message semantics.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 53]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   7  RDMAP Error Management
 
    The RDMAP protocol does not have RDMAP or DDP layer error recovery
    operations built in.  If everything is working, the LLP guarantees
    will ensure that the Messages are arriving at the destination.
 
    If errors are detected at the RDMAP or DDP layer, then the RDMAP,
    DDP and LLP Streams are Abortively Terminated (see section 4.8
    Terminate Header on page 34).
 
    In general poor implementations or improper ULP programming causes
    the errors detected at the RDMAP and DDP layers.  In these cases,
    returning a diagnostic termination error Message and closing the
    RDMAP Stream is far simpler than attempting to maintain the RDMAP
    Stream, particularly when the cause of the error is not known.
 
    If an LLP does not support teardown of a Stream independent of
    other Streams and an RDMAP error results in the Termination of a
    specific Stream, then the LLP MUST label the Stream as an erroneous
    Stream and MUST NOT allow any further data transfer on that Stream
    after RDMAP requests the Stream to be torn down.
 
    For a specific LLP connection, when all Streams are either
    gracefully torn down or are labeled as erroneous Streams, the LLP
    connection MUST be torn down.
 
    Since errors are detected at the Remote Peer (possibly long) after
    RDMA Messages are passed to DDP and the LLP at the Local Peer and
    Completed, the sender cannot easily determine which of its Messages
    have been received. (RDMA Reads are an exception to this rule).
 
    For a list of errors returned to the Remote Peer as a result of an
    Abortive Termination, see section 4.8 Terminate Header on page 34.
 
 7.1  RDMAP Error Surfacing
 
    If an error occurs at the Local Peer, the RDMAP layer MUST attempt
    to inform the local ULP that the error has occurred.
 
    The Local Peer MUST send a Terminate Message for each of the
    following cases:
 
 
 
 
                         Expires January, 2007               [Page 54]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    21. For Errors detected while creating RDMA Write, Send, Send with
        Invalidate, Send with Solicited Event, Send with Solicited
        Event and Invalidate, or RDMA Read Requests, or other reasons
        not directly associated with an incoming Message, the Terminate
        Message and Error code are sent instead of the request.  In
        this case, the Error Type and Error Code fields are included in
        the Terminate Message, but the Terminated DDP Header and
        Terminated RDMA Header fields are set to zero.
 
    22. For errors detected on an incoming RDMA Write, Send, Send with
        Invalidate, Send with Solicited Event, Send with Solicited
        Event and Invalidate, or Read Response Message (after the
        Message has been Delivered by DDP), the Terminate Message is
        sent at the earliest possible opportunity, preferably in the
        next outgoing RDMA Message. In this case, the Error Type, Error
        Code, ULP PDU Length, and Terminated DDP Header fields are
        included in the Terminate Message, but the Terminated RDMA
        Header field is set to zero.
 
    23. For errors detected on an incoming RDMA Read Request Message
        (after the Message has been Delivered by DDP), the Terminate
        Message is sent at the earliest possible opportunity,
        preferably in the next outgoing RDMA Message. In this case, the
        Error Type, Error Code, ULP PDU Length, Terminated DDP Header,
        and Terminated RDMA Header fields are included in the Terminate
        Message.
 
    24. If more than one error is detected on incoming RDMA Messages,
        before the Terminate Message can be sent, then the first RDMA
        Message (and its associated DDP Segment) that experienced an
        error MUST be captured by the Terminate Message in accordance
        with rules 2 and 3 above.
 
 7.2  Errors Detected at the Remote Peer on Incoming RDMA Messages
 
    On incoming RDMA Writes, RDMA Read Response, Sends, Send with
    Invalidate, Send with Solicited Event, Send with Solicited Event
    and Invalidate, and Terminate Messages, the following must be
    validated:
 
    1.  The DDP Layer MUST validate all DDP Segment fields.
 
    2.  The RDMA OpCode MUST be valid.
 
 
                         Expires January, 2007               [Page 55]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    3.  The RDMA Version MUST be valid.
 
        Additionally, on incoming Send with Invalidate and Send with
        Solicited Event and Invalidate Messages, the following must
        also be validated:
 
    4.  The Invalidate STag MUST be valid.
 
    5.  The STag MUST be associated to this RDMAP Stream.
 
    On incoming RDMA Request Messages, the following must be validated:
 
    1.  The DDP Layer MUST validate all Untagged DDP Segment fields.
 
    2.  The RDMA OpCode MUST be valid.
 
    3.  The RDMA Version MUST be valid.
 
    4.  For non-zero length RDMA Read Request Messages:
 
        a.  The Data Source STag MUST be valid.
 
        b.  The Data Source STag MUST be associated to this RDMAP
            Stream.
 
        c.  The Data Source Tagged Offset MUST fall in the range of
            legal offsets associated with the Data Source STag.
 
        d.  The sum of the Data Source Tagged Offset and the RDMA Read
            Message Size MUST fall in the range of legal offsets
            associated with the Data Source STag.
 
        e.  The sum of the Data Source Tagged Offset and the RDMA Read
            Message Size MUST NOT cause the Data Source Tagged Offset
            to wrap.
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 56]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   8  Security
 
    Security Considerations
 
    This section references the resources that discuss protocol-
    specific security considerations and implications of using RDMAP
    with existing security services. A detailed analysis of the
    security issues around implementation and use of the RDMAP can be
    found in [RDMASEC].
 
    [RDMASEC] introduces the RDMA reference model and discusses how the
    resources of this model are vulnerable to attacks and the types of
    attack these vulnerabilities are subject to. It also details the
    levels of Trust available in this peer-to-peer model and how this
    defines the nature of resource sharing.
 
 8.1  Summary of RDMAP specific Security Requirements
 
    [RDMASEC] defines the security requirements for the implementation
    of the components of the RDMA reference model, namely the RDMA
    enabled NIC (RNIC) and the Privileged Resource Manager. An RDMAP
    implementation conforming to this specification MUST conform to
    these requirements.
 
 8.1.1  RDMAP (RNIC) Requirements
 
    RDMAP provides several countermeasures for all types of attacks as
    introduced in [RDMASEC]. In the following, this specification lists
    all security requirements which MUST be implemented by the RNIC. A
    more detailed discussion of RNIC security requirements can be found
    in Section 5 of [RDMASEC].
 
 
 
    1.  An RNIC MUST ensure that a specific Stream in a specific
        Protection Domain cannot access an STag in a different
        Protection Domain.
 
    2.  An RNIC MUST ensure that if an STag is limited in scope to a
        single Stream, no other Stream can use the STag.
 
 
 
 
 
                         Expires January, 2007               [Page 57]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    3.  An RNIC MUST ensure that a Remote Peer is not able to access
        memory outside of the buffer specified when the STag was
        enabled for remote access.
 
    4.  An RNIC MUST provide a mechanism for the ULP to establish and
        revoke the association of a ULP Buffer to an STag and TO range.
 
    5.  An RNIC MUST provide a mechanism for the ULP to establish and
        revoke read, write, or read and write access to the ULP Buffer
        referenced by an STag.
 
 
    6.  An RNIC MUST ensure that the network interface can no longer
        modify an advertised buffer after the ULP revokes remote access
        rights for an STag.
 
 
    7.  An RNIC MUST ensure that a Remote Peer is not able to
        invalidate an STag enabled for remote access, if the STag is
        shared on multiple streams.
 
 
    8.  An RNIC MUST choose the value of STags in a way difficult to
        predict. It is RECOMMENDED to sparsely populate them over the
        full range available.
 
 
    9.  An RNIC MUST NOT enable sharing a CQ across ULPs that do not
        share partial mutual trust.
 
 
    10. An RNIC MUST ensure that if a CQ overflows, any Streams which
        do not use the CQ MUST remain unaffected.
 
 
    11. An RNIC implementation SHOULD provide a mechanism to cap the
        number of outstanding RDMA Read Requests.
 
 
    12. An RNIC MUST NOT enable firmware to be loaded on the RNIC
        directly from an untrusted Local Peer or Remote Peer, unless
        the Peer is properly authenticated (by a mechanism outside the
        scope of this specification. The mechanism presumably entails
 
 
                         Expires January, 2007               [Page 58]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
        authenticating that the remote ULP has the right to perform the
        update), and the update is done via a secure protocol, such as
        IPsec.
 
 
 8.1.2  Privileged Resource Manager Requirements
 
    With RDMAP, all reservations of local resources are initiated from
    local ULPs. To protect from local attacks including unfair
    resource distribution and gaining unauthorized access to RNIC
    resources, a Privileged Resource Manager (PRM) must be
    implemented, which manages all local resource allocation. Note
    that the PRM must not be provided as an independent component, its
    functionality can also be implemented as part of the privileged
    ULP or as part of the RNIC itself.
 
    An PRM implementation must meet the following security
    requirements (a more detailed discussion of PRM security
    requirements can be found in Section 5 of [RDMASEC]):
 
    1.  All Non-Privileged ULP interactions with the RNIC Engine that
        could affect other ULPs MUST be done using the Resource Manager
        as a proxy.
 
    2.  All ULP resource allocation requests for scarce resources MUST
        also be done using a Privileged Resource Manager.
 
    3.  The Privileged Resource Manager MUST NOT assume different ULPs
        share Partial Mutual Trust unless there is a mechanism to
        ensure that the ULPs do indeed share partial mutual trust.
 
    4.  If Non-Privileged ULPs are supported, the Privileged Resource
        Manager MUST verify that the Non-Privileged ULP has the right
        to access a specific Data Buffer before allowing an STag for
        which the ULP has access rights to be associated with a
        specific Data Buffer.
 
    5.  The Privileged Resource Manager MUST control the allocation of
        CQ entries.
 
    6.  The Privileged Resource Manager SHOULD prevent a Local Peer
        from allocating more than its fair share of resources.
 
 
 
                         Expires January, 2007               [Page 59]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    7.  RDMA Read Request Queue resource consumption MUST be controlled
        by the Privileged Resource Manager such that RDMAP/DDP Streams
        which do not share Partial Mutual Trust do not share RDMA Read
        Request Queue resources.
 
    8.  If an RNIC provides the ability to share receive buffers across
        multiple Streams, the combination of the RNIC and the
        Privileged Resource Manager MUST be able to detect if the
        Remote Peer is attempting to consume more than its fair share
        of resources so that the Local Peer can apply countermeasures
        to detect and prevent the attack.
 
 
 8.2  Security Services for RDMAP
 
     RDMAP is using IP based network services to control, read and
     write data buffers over the network. Therefore, all exchanged
     control and data packets are vulnerable to spoofing, tampering and
     information disclosure attacks.
 
     RDMAP Streams that are subject to impersonation attacks, or Stream
    hijacking attacks, can be authenticated, have their integrity
    protected, and be protected from replay attacks. Furthermore,
    confidentiality protection can be used to protect from
    eavesdropping.
 
 
 8.2.1  Available Security Services
 
 
    The IPsec protocol suite [RFC2401] defines strong countermeasures
    to protect an IP stream from those attacks. Several levels of
    protection can guarantee session confidentiality, per-packet source
    authentication, per-packet integrity and correct packet sequencing.
 
 
 
    RDMAP security may also profit from SSL or TLS security services
    provided for TCP based ULPs [RFC2246]. Used underneath RDMAP, these
    security services also provides for stream authentication, data
    integrity and confidentiality. As discussed in [RDMASEC],
    limitations on the maximum packet length to be carried over the
    network and potentially inefficient out-of-order packet processing
 
 
                         Expires January, 2007               [Page 60]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    at the data sink makes SSL and TLS less appropriate for RDMAP than
    IPsec.
 
    If SSL is layered on top of RDMAP, SSL does not protect the RDMAP
    headers. Thus, a man-in-the-middle attack can still occur by
    modifying the RDMAP header to incorrectly place the data into the
    wrong buffer, thus effectively corrupting the data stream.
 
    By remaining independent of ULP and LLP security protocols, RDMAP
    will benefit from continuing improvements at those layers. Users
    are provided flexibility to adapt to their specific security
    requirements and the ability to adapt to future security
    challenges. Given this, the vulnerabilities of RDMAP to active
    third-party interference are no greater than any other protocol
    running over an LLP such as TCP or SCTP.
 
 
 8.2.2  Requirements for IPsec Services for RDMAP
 
 
    Because IPsec is designed to secure arbitrary IP packet streams,
    including streams where packets are lost, RDMAP can run on top of
    IPsec without any change. IPsec packets are processed (e.g.,
    integrity checked and possibly decrypted) in the order they are
    received, and an RDMAP Data Sink will process the decrypted RDMA
    Messages contained in these packets in the same manner as RDMA
    Messages contained in unsecured IP packets.
 
 
 
     The IP Storage working group has defined the normative IPsec
     requirements for IP Storage [RFC3723]. Portions of this
     specification are applicable to the RDMAP. In particular, a
     compliant implementation of IPsec services for RDMAP MUST meet the
     requirements as outlined in Section 2.3 of [RFC3723]. Without
     replicating the detailed discussion in [RFC3723], this includes
     the following requirements:
 
 
    1.  The implementation MUST support IPsec ESP [RFC2406], as well as
        the replay protection mechanisms of IPsec. When ESP is
        utilized, per-packet data origin authentication, integrity and
        replay protection MUST be used.
 
 
                         Expires January, 2007               [Page 61]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    2.  It MUST support ESP in tunnel mode and MAY implement ESP in
        transport mode.
 
    3.  It MUST support IKE [RFC2409] for peer authentication,
        negotiation of security associations, and key management, using
        the IPsec DOI [RFC2407].
 
    4.  It MUST NOT interpret the receipt of a IKE Phase 2 delete
        message as a reason for tearing down the RDMAP stream. Since
        IPsec acceleration hardware may only be able to handle a
        limited number of active IKE Phase 2 SAs, idle SAs may be
        dynamically brought down and a new SA be brought up again, if
        activity resumes.
 
    5.  It MUST support peer authentication using a pre-shared key, and
        MAY support certificate-based peer authentication using digital
        signatures. Peer authentication using the public key
        encryption methods [RFC2409] SHOULD NOT be used.
 
    6.  It MUST support IKE Main Mode and SHOULD support Aggressive
        Mode. IKE Main Mode with pre-shared key authentication SHOULD
        NOT be used when either of the peers uses a dynamically
        assigned IP address.
 
    7.  When digital signatures are used to achieve authentication,
        either IKE Main Mode or IKE Aggressive Mode MAY be used. In
        these cases, an IKE negotiator SHOULD use IKE Certificate
        Request Payload(s) to specify the certificate authority (or
        authorities) that are trusted in accordance with its local
        policy. IKE negotiators SHOULD check the pertinent Certificate
        Revocation List (CRL) before accepting a PKI certificate for
        use in IKE's authentication procedures.
 
 
    8.  Access to locally stored secret information (pre-shared or
        private key for digital signing) must be suitably restricted,
        since compromise of the secret information nullifies the
        security properties of the IKE/IPsec protocols.
 
    9.  It MUST follow the guidelines of Section 2.3.4 of [RFC3723] on
        the setting of IKE parameters to achieve a high level of
        interoperability without requiring extensive configuration.
 
 
 
                         Expires January, 2007               [Page 62]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
     Furthermore, implementation and deployment of the IPsec services
    for RDDP should follow the Security Considerations outlined in
    Section 5 of [RFC3723].
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 63]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   9  IANA
 
    IANA Considerations
 
    This document requests no direct action from IANA.  The following
    consideration is listed here as commentary.
 
    If RDMAP was enabled a priori for a ULP by connecting to a well-
    known port, this well-known port would be registered for the RDMAP
    with IANA. The registration of the well-known port will be the
    responsibility of the ULP specification.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 64]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   10 References
 
 10.1 Normative References
 
    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
        Requirement Levels", BCP 14, RFC 2119, March 1997.
 
    [RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security
        Payload (ESP)", RFC 2406, November 1998.
 
    [RFC2407] Piper, D., "The Internet IP Security Domain of
        Interpretation of ISAKMP", RFC 2407, November 1998.
 
    [RFC2409] Harkins, D. and D. Carrel, "The Internet Key Exchange
        (IKE)", RFC 2409, November 1998.
 
    [RFC3723] Aboba B. et al., "Secure Block Storage Protocols over
        IP", RFC 3723, April 2004.
 
    [VERBS] J. Hilland, "RDMA Protocol Verbs Specification", draft-
        hilland-iwarp-verbs-v1.0 RDMA Consortium, April 2003.
 
    [DDP] H. Shah et al., "Direct Data Placement over Reliable
        Transports", draft-ietf-rddp-ddp-05.txt, February 2005.
 
    [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP
        Specification", draft-ietf-rddp-mpa-04.txt, January 2005.
 
    [SCTP] R. Stewart et al., "Stream Control Transmission Protocol",
        RFC 2960, October 2000.
 
    [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
        September 1981.
 
    [RDMASEC]  J. Pinkerton et al., "DDP/RDMAP Security", draft-ietf-
        rddp-security-09.txt, March 2005.
 
 10.2 Informative References
 
    [RFC2401]  Atkinson, R., Kent, S., "Security Architecture for the
        Internet Protocol", RFC 2401, November 1998.
 
 
 
 
                         Expires January, 2007               [Page 65]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    [RFC2246] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
        RFC 2246, November 1998.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 66]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   11 Appendix
 
 11.1 DDP Segment Formats for RDMA Messages
 
    This appendix is for information only and is NOT part of the
    standard. It simply depicts the DDP Segment format for the various
    RDMA Messages.
 
 11.1.1 DDP Segment for RDMA Write
 
    The following figure depicts an RDMA Write, DDP Segment:
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |   DDP Control | RDMA Control  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       Data Sink STag                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   Data Sink Tagged Offset                     |
     +                                                               +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   RDMA Write ULP Payload                      |
     //                                                             //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 11 RDMA Write, DDP Segment format
 
 11.1.2 DDP Segment for RDMA Read Request
 
    The following figure depicts an RDMA Read Request, DDP Segment:
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 67]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |  DDP Control  | RDMA Control  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      Reserved (Not Used)                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |              DDP (RDMA Read Request) Queue Number             |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |        DDP (RDMA Read Request) Message Sequence Number        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |             DDP (RDMA Read Request) Message Offset            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Data Sink STag (SinkSTag)                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     +                  Data Sink Tagged Offset (SinkTO)             +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                  RDMA Read Message Size (RDMARDSZ)            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Data Source STag (SrcSTag)                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     +                 Data Source Tagged Offset (SrcTO)             +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 12 RDMA Read Request, DDP Segment format
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 68]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 11.1.3 DDP Segment for RDMA Read Response
 
    The following figure depicts an RDMA Read Response, DDP Segment:
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |  DDP Control  | RDMA Control  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       Data Sink STag                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   Data Sink Tagged Offset                     |
     +                                                               +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                RDMA Read Response ULP Payload                 |
     //                                                             //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 13 RDMA Read Response, DDP Segment format
 
 11.1.4 DDP Segment for Send and Send with Solicited Event
 
    The following figure depicts a Send and Send with Solicited
    Request, DDP Segment:
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |  DDP Control  | RDMA Control  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      Reserved (Not Used)                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       (Send) Queue Number                     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 (Send) Message Sequence Number                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      (Send) Message Offset                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       Send ULP Payload                        |
     //                                                             //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
                         Expires January, 2007               [Page 69]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Figure 14 Send and Send with Solicited Event, DDP Segment format
 
 11.1.5 DDP Segment for Send with Invalidate and Send with SE and
        Invalidate
 
    The following figure depicts a Send with invalidate and Send with
    Solicited and Invalidate Request, DDP Segment:
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |   DDP Control | RDMA Control  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                         Invalidate STag                       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       (Send) Queue Number                     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 (Send) Message Sequence Number                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      (Send) Message Offset                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       Send ULP Payload                        |
     //                                                             //
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 15 Send with Invalidate and Send with SE and Invalidate, DDP
    Segment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 70]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 11.1.6 DDP Segment for Terminate
 
    The following figure depicts a Terminate, DDP Segment:
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |   DDP Control | RDMA Control  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      Reserved (Not Used)                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   DDP (Terminate) Queue Number                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |             DDP (Terminate) Message Sequence Number           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                  DDP (Terminate) Message Offset               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       Terminate Control             |      Reserved           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  DDP Segment Length (if any)  |                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
     |                                                               |
     +                                                               +
     |                 Terminated DDP Header (if any)                |
     +                                                               +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     //                                                             //
     |                 Terminated RDMA Header (if any)               |
     +                                                               +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Figure 16 Terminate, DDP Segment format
 
 11.2 Ordering and Completion Table
 
    The following table summarizes the ordering relationships that are
    defined in section 5.5 Ordering and Completions from the standpoint
    of the local peer issuing the two Operations. Note, in the table
    that follows Send includes Send, Send with Invalidate, Send with
    Solicited Event, and Send with Solicited Event and Invalidate
 
 
 
                         Expires January, 2007               [Page 71]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
 ------+-------+----------------+----------------+----------------
 First | Later | Placement      | Placement      | Ordering
  Op   | Op    | guarantee at   | guarantee      | guarantee at
       |       | Remote Peer    | Local Peer     | Remote Peer
       |       |                |                |
 ------+-------+----------------+----------------+----------------
 Send  | Send  | No placement   | Not applicable | Completed in
       |       | guarantee. If  |                | order.
       |       | guarantee is   |                |
       |       | necessary, see |                |
       |       | footnote 1.    |                |
 ------+-------+----------------+----------------+----------------
 Send  | RDMA  | No placement   | Not applicable | Not applicable
       | Write | guarantee. If  |                |
       |       | guarantee is   |                |
       |       | necessary, see |                |
       |       | footnote 1.    |                |
 ------+-------+----------------+----------------+----------------
 Send  | RDMA  | No placement   | RDMA Read      | RDMA Read
       | Read  | guarantee      | Response       | Response
       |       | between Send   | Payload will   | Message will
       |       | Payload and    | not be placed  | not be
       |       | RDMA Read      | at the local   | generated until
       |       | Request Header | peer until the | Send has been
       |       |                | Send Payload is| Completed
       |       |                | placed at the  |
       |       |                | remote peer    |
 ------+-------+----------------+----------------+----------------
 RDMA  | Send  | No placement   | Not applicable | Not applicable
 Write |       | guarantee. If  |                |
       |       | guarantee is   |                |
       |       | necessary, see |                |
       |       | footnote 1.    |                |
 ------+-------+----------------+----------------+----------------
 RDMA  | RDMA  | No placement   | Not applicable | Not applicable
 Write | Write | guarantee. If  |                |
       |       | guarantee is   |                |
       |       | necessary, see |                |
       |       | footnote 1.    |                |
 ------+-------+----------------+----------------+----------------
 RDMA  | RDMA  | No placement   | RDMA Read      | Not applicable
 Write | Read  | guarantee      | Response       |
       |       | between RDMA   | Payload will   |
 
 
                         Expires January, 2007               [Page 72]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
       |       | Write Payload  | not be placed  |
       |       | and RDMA Read  | at the local   |
       |       | Request Header | peer until the |
       |       |                | RDMA Write     |
       |       |                | Payload is     |
       |       |                | placed at the  |
       |       |                | remote peer    |
 ------+-------+----------------+----------------+----------------
 RDMA  | Send  | No placement   | Send Payload   | Not applicable
 Read  |       | guarantee      | may be placed  |
       |       | between RDMA   | at the remote  |
       |       | Read Request   | peer before the|
       |       | Header and Send| RDMA Read      |
       |       | payload        | Response is    |
       |       |                | generated.     |
       |       |                | If guarantee is|
       |       |                | necessary, see |
       |       |                | footnote 2.    |
 ------+-------+----------------+----------------+----------------
 RDMA  | RDMA  | No placement   | RDMA Write     | Not applicable
 Read  | Write | guarantee      | Payload may be |
       |       | between RDMA   | placed at the  |
       |       | Read Request   | remote peer    |
       |       | Header and RDMA| before the RDMA|
       |       | Write payload  | Read Response  |
       |       |                | is generated.  |
       |       |                | If guarantee is|
       |       |                | necessary, see |
       |       |                | footnote 2.    |
 ------+-------+----------------+----------------+----------------
 RDMA  | RDMA  | No placement   | No placement   | Second RDMA
 Read  | Read  | guarantee of   | guarantee of   | Read Response
       |       | the two RDMA   | the two RDMA   | will not be
       |       | Read Request   | Read Response  | generated until
       |       | Headers        | Payloads.      | first RDMA Read
       |       | Additionally,  |                | Response is
       |       | there is no    |                | generated.
       |       | guarantee that |                |
       |       | the Tagged     |                |
       |       | Buffers        |                |
       |       | referenced in  |                |
       |       | the RDMA Read  |                |
       |       | will be read in|                |
 
 
                         Expires January, 2007               [Page 73]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
       |       | order          |                |
    Figure 17 Operation Ordering
 
    Footnote 1:  If the guarantee is necessary, a ULP may insert an
    RDMA Read Operation and wait for it to complete to act as a Fence.
 
    Footnote 2:  If the guarantee is necessary, a ULP may wait for the
    RDMA Read Operation to complete before performing the Send.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 74]


   12 Author's Address
 
 Paul R. Culley
 Hewlett-Packard Company
 20555 SH 249
 Houston, Tx. USA 77070-2698
 Phone:  281-514-5543
 Email:  paul.culley@hp.com
 
 
 Dave Garcia
 Hewlett-Packard Company
 19333 Vallco Parkway
 Cupertino, Ca. USA 95014
 Phone:  408.285.6116
 Email:  dave.garcia@hp.com
 
 
 Jeff Hilland
 Hewlett-Packard Company
 20555 SH 249
 Houston, Tx. USA 77070-2698
 Phone:  281-514-9489
 Email:  jeff.hilland@hp.com
 
 
 Bernard Metzler
 IBM Research GmbH
 Zurich Research Laboratory
 Saeumerstrasse 4
 CH-8803 Rueschlikon, Switzerland
 Phone: +41 44 724 8605
 Email:  bmt@zurich.ibm.com
 
 
 Renato J. Recio
 IBM Corp.
 11501 Burnett Road
 Austin, Tx. USA 78758
 Phone:  512-838-3685
 Email:  recio@us.ibm.com
 
 
 
 
 
 
                         Expires January, 2007               [Page 75]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   13 Contributors
 
    Dwight Barron
        Hewlett-Packard Company
        20555 SH 249
        Houston, Tx. USA 77070-2698
        Phone:  281-514-2769
        Email:  dwight.barron@hp.com
 
    Caitlin Bestler
        Broadcom Corporation
        16215 Alton Parkway
        Irvine, CA.  USA 92619-7013
        Phone:  949-926-6383
        Email:  caitlinb@broadcom.com
 
    John Carrier
        Cray, Inc.
        411 First Avenue S, Suite 600
        Seattle, WA 98104-2860 USA
        Phone: 206-701-2090
        Email: carrier@cray.com
 
    Ted Compton
        EMC Corporation
        Research Triangle Park, NC 27709, USA
        Phone: 919-248-6075
        Email: compton_ted@emc.com
 
    Uri Elzur
        Broadcom Corporation
        16215 Alton Parkway
        Irvine, California 92619-7013 USA
        Phone: +1 (949) 585-6432
        Email: Uri@Broadcom.com
 
    Hari Ghadia
        Adaptec, Inc.
        691 S. Milpitas Blvd.,
        Milpitas, CA 95035  USA
        Phone: +1 (408) 957-5608
        Email: hari_ghadia@adaptec.com
 
 
 
                         Expires January, 2007               [Page 76]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Howard C. Herbert
        Intel Corporation
        MS CH7-404
        5000 West Chandler Blvd.
        Chandler, Arizona 85226
        Phone: 480-554-3116
        Email: howard.c.herbert@intel.com
 
    Mike Ko
        IBM
        650 Harry Rd.
        San Jose, CA 95120
        Phone: (408) 927-2085
        Email: mako@us.ibm.com
 
    Mike Krause
        Hewlett-Packard Company
        43LN
        19410 Homestead Road
        Cupertino, CA  95014 USA
        Phone: 408-447-3191
        Email: krause@cup.hp.com
 
    Dave Minturn
        Intel Corporation
        MS JF1-210
        5200 North East Elam Young Parkway
        Hillsboro, Oregon  97124
        Phone: 503-712-4106
        Email: dave.b.minturn@intel.com
 
    Mike Penna
        Broadcom Corporation
        16215 Alton Parkway
        Irvine, California 92619-7013 USA
        Phone: +1 (949) 926-7149
        Email: MPenna@Broadcom.com
 
    Jim Pinkerton
        Microsoft, Inc.
        One Microsoft Way
        Redmond, WA, USA 98052
        Email:  jpink@microsoft.com
 
 
                         Expires January, 2007               [Page 77]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Hemal Shah
        Broadcom Corporation
        16215 Alton Parkway
        Irvine, CA. USA 92619-7013
        Phone: 949-926-6941
        Email:
 
    Allyn Romanow
        Cisco Systems
        170 W Tasman Drive
        San Jose, CA 95134 USA
        Phone: +1 408 525 8836
        Email: allyn@cisco.com
 
    Tom Talpey
        Network Appliance
        375 Totten Pond Road
        Waltham, MA 02451 USA
        Phone: +1 (781) 768-5329
        EMail: thomas.talpey@netapp.com
 
    Patricia Thaler
        Broadcom Corporation
        16215 Alton Parkway
        Irvine, CA. USA 92619-7013
        Phone: +1-916-570-2707
        email: pthaler@broadcom.com
 
    Jim Wendt
        Hewlett-Packard Company
        8000 Foothills Boulevard MS 5668
        Roseville, CA 95747-5668 USA
        Phone: +1 916 785 5198
        Email: jim_wendt@hp.com
 
    Madeline Vega
        IBM
        11400 Burnet Rd. Bld.45-2L-007
        Austin, TX.  USA 78758
        Phone:  512-838-7739
        Email:  mvega1@us.ibm.com
 
 
 
 
                         Expires January, 2007               [Page 78]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
    Claudia Salzberg
        IBM
        11501 Burnet Rd. Bld.902-5B-014
        Austin, TX.  USA 78758
        Phone:  512-838-5156
        Email:  salzberg@us.ibm.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 79]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   14 Intellectual Property Statement
 
    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.
 
    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use
    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.
 
    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
    this standard. Please address the information to the IETF at ietf-
    ipr@ietf.org.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 80]


 Internet-Draft        RDMA Protocol Specification       June 2006
 
   15 Full Copyright Statement
 
    Copyright (C) The Internet Society (2006).
 
    This document is subject to the rights, licenses and restrictions
    contained in BCP 78, and except as set forth therein, the authors
    retain all their rights.
 
    This document and the information contained herein are provided on
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
    THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
    THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
    ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
    PARTICULAR PURPOSE.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                         Expires January, 2007               [Page 81]