[Search] [txt|pdfized|bibtex] [Tracker] [Email] [Nits]
Versions: 00                                                            
Internet Draft                                Tom Talpey        (NetApp)
May, 2002                                     David Robinson       (Sun)
                                              Robert Teisberg       (HP)
                                              Jim Wendt             (HP)

Document: draft-talpey-rdma-over-ip-requirements-00.txt
Expires:  November 2002

                    RDMA over IP (ROI) Requirements


Status of this Memo

     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as Internet-
     Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

Copyright Notice

     Copyright (C) The Internet Society (2002). All Rights Reserved.

Abstract

     This draft defines terminology and requirements to be used in
     conjunction with the RDMA over IP (ROI) effort.










Talpey, et al             Expires November 2002                 [Page 1]


Internet-Draft          RDMA over IP Requirements               May 2002


Table Of Contents

     1.   Introduction . . . . . . . . . . . . . . . . . . . . . . .   2
             Overview  . . . . . . . . . . . . . . . . . . . . . . .   3
             Authors' Note . . . . . . . . . . . . . . . . . . . . .   5
             Document Conventions  . . . . . . . . . . . . . . . . .   5
     2.   Terminology  . . . . . . . . . . . . . . . . . . . . . . .   6
             General Terms . . . . . . . . . . . . . . . . . . . . .   6
             Direct Data Placement Terms . . . . . . . . . . . . . .   7
             Remote Direct Memory Access Terms . . . . . . . . . . .   9
             Memory Management Terms . . . . . . . . . . . . . . . .  10
             Terminology Note  . . . . . . . . . . . . . . . . . . .  11
     3.   Requirements . . . . . . . . . . . . . . . . . . . . . . .  13
             Implementation Goals  . . . . . . . . . . . . . . . . .  13
             Transport Requirements  . . . . . . . . . . . . . . . .  13
             Direct Data Placement Requirements  . . . . . . . . . .  16
             Remote Direct Memory Access Requirements  . . . . . . .  17
             Upper Layer Protocol Requirements . . . . . . . . . . .  18
             Security Requirements . . . . . . . . . . . . . . . . .  19
     4.   Acknowledgements . . . . . . . . . . . . . . . . . . . . .  20
     5.   References . . . . . . . . . . . . . . . . . . . . . . . .  20
          Authors' Addresses . . . . . . . . . . . . . . . . . . . .  21
          Full Copyright Statement . . . . . . . . . . . . . . . . .  22


1.  Introduction

     This document defines terminology and presents requirements for
     Remote Direct Memory Access over the Internet Protocol Suite.

     The concept is subdivided into two primary components, referred to
     as Remote Direct Memory Access, or RDMA, and Direct Data Placement,
     or DDP.  This document considers the specification of both RDMA and
     DDP protocols for the Internet Protocol family, to be called "RDMA
     over IP", or ROI.  Section 2 defines many important terms.

     The goal of the DDP protocol is to allow the efficient placement of
     data into buffers designated by Upper Layer Protocols (ULP).
     Efficiency may be characterized by the minimization of the number
     of transfers of the data over the receiver's system buses.

     The goal of the RDMA protocol is to provide the semantics to enable
     Remote Direct Memory Access between ROI peers in a way consistent
     with application requirements.  The RDMA protocol is not an
     application protocol in itself, but provides facilities immediately
     useful to existing and future networking, storage, and other
     application protocols.  [SDP] [DAFS] [VI] [IB] [MYR] [SVR] [FIBRE]




Talpey, et al             Expires November 2002                 [Page 2]


Internet-Draft          RDMA over IP Requirements               May 2002


     The DDP and RDMA protocols work together to achieve their
     respective goals.  RDMA provides facilities to a ULP for
     identifying buffers, controlling the transfer of data between ULP
     peers, and providing completion notifications to the ULP.  RDMA
     uses the features of DDP to steer payloads to specific buffers at
     the Data Sink.  ULPs that do not require the features of RDMA may
     be layered directly on top of DDP.

     The DDP and RDMA protocols are transport independent.  The
     following figure shows the relationship between RDMA, DDP, Upper
     Layer Protocols and Transport.

          +-------------------+--------------+----------------+
          |       ULP         |     ULP      |      ULP       |
          +-----+-------------+-------------------------------+
          |     |             |             RDMA              |
          |     |             +-------------------------------+
          |     |                     DDP                     |
          |     +--------------------+------------------------+
          |        Transport         |      Transport         |
          +--------------------------+------------------------+

1.1.  Overview

     Several performance trends are at work in networked systems.
     Moore's law describing CPU performance trends is well known.  The
     nearly parallel trend in network link bandwidth differs from
     Moore's law primarily in lacking a catchy name.  Today's
     inexpensive network adapters running at 1Gbps succeed the 100Mbps
     adapters of the 1990s and the 10Mbps adapters of the 1980s in a so
     far unbroken sequence.  [ROM] [HP97] [STREAM]

     Less remarked but painfully familiar to CPU and system designers is
     the trend in memory performance.  Memory speeds have been improving
     along with CPU and network speeds, but at a much slower pace.
     [HP97]

     In a conventional implementation of an Internet protocol stack,
     incoming link-layer frames are deposited by hardware into buffers
     owned by the operating system.  Software in the host CPU parses
     headers until the data can be associated with a specific
     application buffer, at which time the payload is copied to the
     buffer.  This means that each byte of incoming payload crosses the
     memory bus at least three times; once when the containing frame is
     received, and twice when the payload is copied to the application's
     buffer.  Furthermore, the copy indirectly causes additional memory
     traffic as cache lines are flushed and reloaded.




Talpey, et al             Expires November 2002                 [Page 3]


Internet-Draft          RDMA over IP Requirements               May 2002


     Network Interface Controllers (NICs) that offload protocol
     processing only up through the transport layer cannot address the
     memory bandwidth problem caused by copying because the information
     needed to place the payload is not known to the transport layer.
     While the problem can be solved one Upper Layer Protocol (ULP) at a
     time by implementing the ULP in the NIC (this is being done now by
     several vendors for iSCSI [ISCSI1] [ISCSI2], for example), there
     are so many ULPs affected by the memory bandwidth problem that
     migrating ULP implementations into the NIC is economically
     infeasible.  Neither a multitude of specialized NICs each
     implementing one ULP, nor a large, complex, expensive, multipurpose
     NIC implementing many ULPs is attractive to either vendors or end
     users.

     The problem of memory bandwidth consumption due to copying of
     network payload can be solved by a common protocol that identifies
     the final destination of the payload.  Such a protocol has come to
     be known as Direct Data Placement (DDP).  Just as a network layer
     protocol such as IP can be thought of as steering data from a
     source node to a destination node, and a transport layer protocol
     such as SCTP [SCTP] as steering data from a source process to a
     destination process, so DDP steers data from a source buffer to a
     destination buffer.  A protocol stack residing in a NIC and
     containing all layers up through DDP can place incoming payloads
     directly in the ULP's buffer with only one memory bus crossing and
     can do so for any ULP.

     Another source of overhead in networked computing systems is
     context switches.  Just as many conventional peripheral devices use
     Direct Memory Access (DMA) to read and write buffers without
     interrupting ongoing processing, a process can use Remote Direct
     Memory Access (RDMA) to read and write buffers belonging to a
     process in a remote node without interrupting the remote process's
     (or unrelated processes') activity.  This can eliminate yet another
     source of memory bus traffic and cache pollution.

     Even when the network protocol stack is not offloaded to a
     peripheral device, RDMA provides benefits to applications by giving
     them a convenient way to identify both the source and the
     destination of data to be transferred.  Complete control of an RDMA
     transfer resides in one peer, simplifying both the application
     protocol and the application's logic.

     DDP therefore solves the problem of efficiently directing payloads
     to buffers, while RDMA enables simplified and more efficient
     application logic by giving applications a way to identify source
     and destination buffers, controlling the entire transfer from one
     end.



Talpey, et al             Expires November 2002                 [Page 4]


Internet-Draft          RDMA over IP Requirements               May 2002


1.2.  Authors' Note

     In order to make a meaningful start on these requirements, the
     authors found it necessary to begin with some fundamental
     assumptions about the nature of the proposed solution.  It has not
     been the intention of the authors to summarize discussion of any
     implementation, nor to capture all possible alternatives.

     As such, this Draft only considers the case of layering the ROI
     solution atop IP Transport.  We leave it to other Drafts to explore
     alternatives.

     Initially, it is expected that ROI will target the Stream Control
     Transmission Protocol.  This is not to preclude consideration of
     TCP or other Internet Protocol family member, within the
     requirements stated.  The ROI solution must ensure its portability
     among suitable IP transports.

     The authors invite discussions of these requirements, and expect a
     lively debate!

1.3.  Document Conventions

     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
     "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL"
     appearing in this document are to be interpreted as described in
     RFC2119. [RFCTERMS]

     Also in this document, the following naming conventions for certain
     functional components have been adopted.

          When referring to the RDMA over IP architecture, the term
          "ROI" is used.

          When referring to all the ROI component protocols taken
          together, the term "ROI protocols" is used.

          When referring to the RDMA protocol alone, or when taken
          together with DDP, of a property specifically provided by the
          RDMA protocol, the term "RDMA protocol" is used.

          When referring to the DDP protocol alone, exclusive of RDMA,
          the term "DDP protocol" is used.

     These terms are described in greater detail in the following
     section.





Talpey, et al             Expires November 2002                 [Page 5]


Internet-Draft          RDMA over IP Requirements               May 2002


2.  Terminology

     This section contains proposed terminology definitions for RDMA
     over IP (ROI) and serves to establish a common language for
     continued discussions and forthcoming documents.

2.1.  General Terms

     Data Sink
          The peer receiving a directly placed data payload. Note that
          the Data Sink can be required to both send and receive
          RDMA/DDP Messages to transfer a data payload.

     Data Source
          The peer sending a directly placed data payload. Note that the
          Data Source can be required to both send and receive RDMA/DDP
          Messages to transfer a data payload.

     Fabric
          The collection of links, switches, and routers that connect a
          set of Nodes with ROI protocol implementations.

     LLP
          Lower Layer Protocol - The protocol layer beneath the protocol
          layer currently referenced. For example, for DDP, the LLP is
          SCTP, TCP, or other transport protocols. For RDMA, the LLP is
          DDP.

     Local Peer
          The ROI protocol implementation on the local end of the
          connection. Used to refer to the local entity when describing
          a protocol exchange or other interaction between two Nodes.

     NIC
          Network Interface Controller. In this context, this would be a
          NIC with ROI functionality.

     NIC Driver
          Software that supports a NIC, and provides an appropriate OS
          interface as required by the OS.

     Node
          A computing device attached to one or more links of a Fabric
          (network). A Node in this context does not refer to a specific
          application or protocol instantiation running on the computer.
          A Node may consist of one or more NICs installed in a host
          computer.




Talpey, et al             Expires November 2002                 [Page 6]


Internet-Draft          RDMA over IP Requirements               May 2002


     Remote Peer
          The ROI protocol implementation on the opposite end of the
          connection. Used to refer to the remote entity when describing
          protocol exchanges or other interactions between two Nodes.

     ROI
          RDMA over IP. The set of wire protocols that provide Direct
          Data Placement and RDMA Operations to a ULP.

     ULP
          Upper Layer Protocol - The protocol layer above the protocol
          layer currently referenced. The ULP for RDMA/DDP is expected
          to be an OS, Application, adaptation layer, or proprietary
          device.  The ROI documents do not specify a ULP, but provide a
          set of semantics that allow a ULP to be designed to utilize
          ROI.

     ULP Payload
          The ULP data that is contained within a single protocol
          segment or packet (e.g. an RDMA Operation as viewed within a
          DDP segment)

2.2.  Direct Data Placement (DDP) Terms

     Data Delivery
          For DDP, delivery is defined as the process of informing the
          ULP or application that a particular DDP Message or Segment is
          available for use.  This is specifically different from
          "Placement", which may generally occur in any order, while the
          order of "delivery" is strictly defined.  See "Data
          Placement".

     Data Placement
          For DDP, this term is specifically used to indicate the
          process of writing to a data buffer by a DDP implementation.
          DDP Segments carry Placement information which may be used by
          the receiving DDP implementation to perform Data Placement of
          the ULP payload. See "Data Delivery".

     Placement Validation
          For DDP, the set of actions performed to validate the
          Placement information for a given DDP Segment.

     Payload Validation
          For DDP, the set of actions performed to validate the
          integrity of the payload in a DDP Segment.





Talpey, et al             Expires November 2002                 [Page 7]


Internet-Draft          RDMA over IP Requirements               May 2002


     DDP Header
          The header present in all DDP Segments. The DDP Header
          contains control and Placement fields that are used to define
          the final Placement location for the ULP payload carried in a
          DDP Segment.

     DDP Message
          A ULP-defined unit of data interchange, which is subdivided
          into one or more payloads carried in one or more,
          respectively, DDP Segments.

     DDP Segment
          The smallest unit of data transfer for the DDP protocol. It
          includes a DDP Header and payload (if present). A DDP Segment
          is typically sized in order to optimize appropriately on the
          underlying transport.

     DDP Stream
          A sequence of DDP Messages whose ordering is defined by the
          LLP.  A DDP Stream may map to an SCTP stream, or other LLP-
          specific facility.  Note that DDP provides no ordering
          guarantees between DDP Streams.

     Direct Data Placement (DDP)
          A mechanism whereby ULP data contained within DDP Segments may
          be placed directly into its final destination in memory
          without processing of the ULP.

     Steering Tag
          An identifier of a Memory Region on a Node, valid as defined
          within a protocol specification.

     STag
          Steering Tag

     Target Offset
          The offset within a Memory Region.  The offset is a Data Sink-
          supplied base value which is manipulated by the Data Source to
          direct data transfers within the Memory Region using ordinary
          address arithmetic.

     TO
          Target Offset.

     Decorated
          A DDP Segment that is accompanied by DDP Decoration is
          considered to be "decorated".




Talpey, et al             Expires November 2002                 [Page 8]


Internet-Draft          RDMA over IP Requirements               May 2002


     DDP Decoration
          The Placement information accompanying a DDP Segment and which
          facilitates Data Placement.

     Undecorated
          A DDP Segment that is lacking DDP Decoration is considered to
          be "undecorated". Undecorated ULP data may still be
          accompanied by a header sufficient to distinguish it from
          Decorated data.

2.3.  Remote Direct Memory Access (RDMA) Terms

     Remote Direct Memory Access (RDMA)
          A method of accessing memory on a remote system in which the
          local system specifies the remote location of the data to be
          transferred.

     RDMA Protocol
          A wire protocol that supports RDMA Operations to transfer ULP
          data between the Local Peer and the Remote Peer.

     RDMA Stream
          An association between a pair of RDMA implementations,
          possibly on different Nodes, which transfer ULP data using
          RDMA Operations. There may be multiple RDMA Streams on a
          single Node.

     RDMA Operation
          A sequence of RDMA messages, including control messages, to
          transfer data from a Data Source to a Data Sink.

     RDMA Message
          A data transfer mechanism used to fulfill an RDMA Operation.

     RDMA Buffer
          A region of memory used to send or receive data by RDMA.  The
          region may be Tagged or Untagged (see Memory Management
          terms).

     RDMA Read
          An RDMA Operation used by the Data Sink to transfer the
          contents of a source RDMA Buffer from the Remote Peer to the
          Local Peer. An RDMA Read Operation consists of a single RDMA
          Read Request message and at least one RDMA Read Response
          message.

     RDMA Read Request
          An RDMA message used by the Data Sink to request the Data



Talpey, et al             Expires November 2002                 [Page 9]


Internet-Draft          RDMA over IP Requirements               May 2002


          Source to transfer the contents of an RDMA Buffer. The RDMA
          Read Request Message describes both the Data Source and Data
          Sink RDMA Buffers.

     RDMA Read Response
          An RDMA message used by the Data Source to transfer the
          contents of a RDMA Buffer to the Data Sink, in response to an
          RDMA Read Request. The RDMA Read Response message only
          describes the Data Sink RDMA Buffer.

     RDMA Write
          An RDMA Operation that transfers the contents of a source RDMA
          Buffer from the Local Peer to a destination RDMA Buffer at the
          Remote Peer using RDMA. The RDMA Write message only describes
          the Data Sink RDMA Buffer.

     Send
          An RDMA Operation that transfers the contents of a ULP Buffer
          from the Local Peer to an RDMA Buffer at the Remote Peer. The
          Send message does not specify the Data Sink RDMA Buffer.

     RDMA Completion
          For RDMA, completion is defined as the process of informing
          the ULP or application that a particular RDMA Operation has
          completed.  The completion semantic of each RDMA Operation is
          distinctly defined.

     Completion
          RDMA Completion.

     Fence
          To block the current RDMA Operation from executing until
          certain other RDMA Operations have completed.

     Solicited Event
          A facility by which an RDMA Operation sender may cause an
          event to be generated at the recipient when a Send message is
          received.

2.4.  Memory Management Terms

     Advertisement
          The act of informing a Remote Peer of the availability of a
          local buffer.  A Node exposes a registered Buffer for incoming
          read or write access by informing its ROI peer of the buffer
          identifiers (STag, base address, length). This advertisement
          of buffer information is not defined by ROI and is left to the
          ULP. A typical method would be for the Local Peer to embed the



Talpey, et al             Expires November 2002                [Page 10]


Internet-Draft          RDMA over IP Requirements               May 2002


          buffer's Steering Tag, address, and length in a Send message
          destined for the Remote Peer.

     Tagged Buffer
          A buffer that is Advertised for RDMA access by the Remote
          Peer. A Tagged Buffer is manipulated by the Remote Peer by
          means of its associated Steering Tag, Target Offset, and
          length.

     Untagged Buffer
          A ULP receive buffer used to receive incoming Remote Peer Send
          transfers.  The buffer is referred to as untagged because the
          Data Source does not specify the final destination of the Send
          on the Data Sink.

     Memory Registration
          The act of registering a host Memory Region for use by a ULP
          or application. The memory registration operation returns a
          Steering Tag.

     Memory Region
          An area of registered memory, which can be accessed in a
          contiguous fashion by the DDP implementation.  The Memory
          Region is thereby enabled for DDP local access and optional
          remote access. A Memory Region is identified by a Steering Tag
          and has an associated length.  Note that the DDP
          implementation defines the mapping, and therefore the Memory
          Region may or may not be contiguous in any other address
          space.

     ULP Buffer
          A buffer owned above the RDMA layer and exposed to the RDMA
          layer either as a Tagged Buffer or an Untagged Buffer.

2.5.  Terminology Note

     The following terms have been avoided in this document to avoid
     confusion or overlap as noted.

     Chunk
          Reserved for SCTP (use DDP Segment)

     Frame
          Reserved for the Data Link Layer

     Sender/Receiver or Requestor/Responder
          Data Sink and Data Source are clearer and are preferred in DDP
          context.



Talpey, et al             Expires November 2002                [Page 11]


Internet-Draft          RDMA over IP Requirements               May 2002


     Notification
          Use RDMA Completion in RDMA context.

















































Talpey, et al             Expires November 2002                [Page 12]


Internet-Draft          RDMA over IP Requirements               May 2002


3.  Requirements

     The following sections outline the requirements for ROI components.

3.1.  Implementation Goals

          ROI MUST enable Direct Data Placement and Remote Direct Memory
          Access semantics over existing Internet Protocols.

          ROI MUST enable cost competitive solutions.

          ROI MUST provide high bandwidth and bandwidth aggregation.

          ROI MUST enable low host system overhead.

          ROI SHOULD keep the protocol simple.

          ROI SHOULD enable creation of optimized implementations.
          Targeted optimizations SHOULD include reducing memory bus
          crossings, reducing host-adapter interactions, and enabling
          parallel processing.

          ROI SHOULD be specified as a layered implementation atop IP
          transport.

3.2.  Transport Requirements

     The following are requirements placed by ROI on any IP transport
     Lower Layer Protocol.

     3.2.1.  Layering

          The ROI protocols MUST NOT require changes to any supported IP
          transport, nor require that new semantics be imposed.

          The ROI protocols SHOULD NOT replicate services available at
          lower layers.

          The ROI protocols SHOULD expose fundamental properties of the
          underlying IP transport to the ULP to the maximum extent
          possible, consistent with the explicit requirements of ROI.


          -    ROI over a connection-oriented transport MUST expose
               connection-oriented semantics to the ULP.

          -    ROI over a connectionless transport MUST expose
               connectionless semantics to the ULP.



Talpey, et al             Expires November 2002                [Page 13]


Internet-Draft          RDMA over IP Requirements               May 2002


          -    ROI over an explicitly unreliable protocol SHOULD expose
               unreliable semantics to the ULP.  Certain ROI features
               MAY have the side effect of providing information to the
               ULP about datagram loss and MAY involve retries, but
               assuring reliable delivery of payload MUST NOT be their
               primary purpose.

          ROI MUST use transport connections conservatively.

          ROI MUST be designed to allow future substitution of transport
          protocols with minimal changes to ROI protocol operation,
          message structures and formats.

     3.2.2.  Network Infrastructure

          ROI MUST function over a variety of IP network topologies
          (e.g. dedicated LAN, shared LAN, private WAN, public
          Internet).

          ROI MUST be compatible with both IPv4 and IPv6.

          ROI SHOULD NOT require changes to infrastructure beyond those
          already required by the supported IP transport.

          ROI SHOULD function correctly through middleboxes (e.g. NATs,
          firewalls) to the extent that the supported IP transport
          allow. [MIDTAX]

     3.2.3.  Ordering and Reliability

          The transport MAY support reliable operation.

          The transport MUST detect duplicate transmissions and thereby
          deliver ROI Operations at most once, or signal an error.

          The transport MAY support unordered delivery.

          ROI MUST provide support for the ordering of Completions
          within classes of RDMA Operations.

     3.2.4.  Connection model

          The ROI protocols MUST specify a binding to connection
          oriented transports.

          The ROI protocols MAY specify a binding to datagram oriented
          transports.




Talpey, et al             Expires November 2002                [Page 14]


Internet-Draft          RDMA over IP Requirements               May 2002


          The ROI protocols are NOT REQUIRED to support broadcast or
          multicast operations.

     3.2.5.  Integrity

          The ROI protocols MUST specify validation mechanisms that
          cover at least the DDP Placement information.

          DDP Placement information MUST be validated prior to Data
          Placement.

          The data MUST be validated by the transport before Data
          Delivery.

          The ROI protocols SHOULD NOT guarantee stronger integrity than
          the underlying transport.

          The ROI protocols MAY provide stronger integrity guarantees by
          means of optional facilities.

     3.2.6.  DDP Transport Interaction

          The DDP protocol SHOULD be capable of presenting data to the
          IP transport layer in DDP Segments that allow the transmission
          of the data within the IP transport layer's optimal segment
          and without requiring fragmentation and reassembly.

          All DDP Headers and payload MUST appear as ordinary payload
          within IP transport segments.

          DDP Header information MAY also be duplicated in extensible IP
          transport headers as allowed by the respective standards.

          DDP MAY also use information reported to it by the underlying
          IP transport.

     3.2.7.  Congestion control

          Any IP transport protocol underlying the DDP protocol MUST
          support congestion control as described in RFC2914. [CONG]

          The ROI protocols are NOT REQUIRED to provide congestion
          control.  To provide it would duplicate the lower mechanism.

          The ROI protocols are NOT REQUIRED to implement flow control,
          as they will operate on top of transports with flow control
          and below applications with flow control.




Talpey, et al             Expires November 2002                [Page 15]


Internet-Draft          RDMA over IP Requirements               May 2002


3.3.  Direct Data Placement Requirements

     The following are requirements applicable to the DDP layer.

     3.3.1.  Transport

          DDP MUST be supported over SCTP.

          DDP MAY be defined over any IP transport meeting the
          requirements of section 3.2.

     3.3.2.  Placement

          The DDP Segment MUST contain DDP Headers that facilitate
          Placement of the data into the destination buffers without
          interpretation of the Upper Layer Protocol.  DDP headers MUST
          be self-contained and self-describing.

          DDP MUST enable efficient Direct Data Placement of incoming
          data.

          DDP is NOT REQUIRED to provide ordering guarantees between DDP
          Streams.

     3.3.3.  Memory Model

          The contents of all Untagged Buffers, and of writeable Tagged
          Buffers which are Advertised to the Remote Peer, and passed to
          DDP by any ULP are indeterminate unless a successful Data
          Delivery occurs.

          The Placement of data MUST NOT be dependent upon any previous
          or subsequent DDP Segments.

          Access to Memory Regions MUST be available on a byte-level
          granularity, MUST be strictly bounds checked and MUST NOT
          permit "wrapping" or "overflow".

          Memory Regions MUST support protection attributes specifying
          at least "read" and "write".

          All Memory Region accesses MUST be checked for validity
          according to the protection attributes of the region.

     3.3.4.  Data Delivery

          The Data Delivery of Send and RDMA Write Operations on a
          single DDP Stream MUST be delivered to the ULP in the sequence



Talpey, et al             Expires November 2002                [Page 16]


Internet-Draft          RDMA over IP Requirements               May 2002


          in which all such Operations were issued.

          The Data Delivery of RDMA Read Operations on a single DDP
          Stream MUST be delivered to the ULP in the sequence in which
          they were issued.

          The Data Delivery of Send, RDMA Write and RDMA Read Operations
          SHOULD be delivered promptly.

     3.3.5.  Header Contents and Validation

          The DDP protocol MUST support a validation method for DDP
          Headers and payload which encompasses any requirements made by
          its Upper Layer Protocols as well as facilities provided by
          its Lower Layer Protocols.

          The DDP layer MUST signal to the ULP any unrecoverable
          transport error, including unrecoverable data corruption.

3.4.  Remote Direct Memory Access Requirements

     The following are requirements applicable to the RDMA layer.

     3.4.1.  Send

          The RDMA protocol MUST support a Send Operation, capable of
          employing DDP to send a data payload to an Untagged Buffer at
          the Remote Peer and supporting a defined RDMA Completion
          ordering at both the Data Source and Data Sink.

     3.4.2.  Remote write

          The RDMA protocol MUST support an RDMA Write Operation,
          capable of employing DDP to send a data payload to a Tagged
          Buffer at the Remote Peer and supporting a defined RDMA
          Completion ordering at the Data Source.

     3.4.3.  Remote read

          The RDMA protocol MUST support an RDMA Read Operation, capable
          of employing DDP to retrieve a data payload from a Tagged
          Buffer at the Remote Peer and supporting a defined RDMA
          Completion ordering at the Data Sink.

     3.4.4.  Ordering and Completion

          The RDMA protocol MUST provide ordering rules and error
          semantics for all its Operations.



Talpey, et al             Expires November 2002                [Page 17]


Internet-Draft          RDMA over IP Requirements               May 2002


          The RDMA protocol MUST provide the ability to select whether
          RDMA Completions are required at the Data Sink.

          The RDMA protocol MUST successfully perform each ULP-requested
          Operation in the prescribed order, or return an error.

          Successful ULP-requested Operations MUST be performed exactly
          once.

          The RDMA protocol MUST support a mechanism for Solicited
          Events.

     3.4.5.  Memory Model

          RDMA MUST provide a way for the ULP to specify that a
          particular remote peer has read, write or read-write access to
          an Advertised RDMA Buffer.  The RDMA protocol is NOT REQUIRED
          to include a means to communicate the granted access
          permissions to the remote peer.

          The RDMA protocol MUST include a way to report an access
          violation to the end-point that requested the forbidden
          access.

          RDMA MUST support byte-granularity specification of the base
          address and size of each Advertised RDMA Buffer.

          An RDMA implementation MUST enforce the bounds and access
          permissions of each Advertised RDMA Buffer.

3.5.  Upper Layer Protocol Requirements

     The following are requirements applicable to the layers above RDMA.

          Upper Layer Protocol implementations SHOULD NOT modify the
          contents of buffers passed to the RDMA and DDP layers until
          their Data Delivery is implied from an appropriate RDMA
          Completion, subject to the ordering rules.  Modifying the
          contents of active buffers will result in undefined behavior.

          Upper Layer Protocol implementations SHOULD choose a transport
          with appropriate semantics to support its needs, such as
          ordering and reliability.  The ROI protocols are NOT REQUIRED
          to support any additional transport semantics on any Stream.

          Upper Layer Protocol implementations SHOULD provide their own
          flow control.




Talpey, et al             Expires November 2002                [Page 18]


Internet-Draft          RDMA over IP Requirements               May 2002


          Upper Layer Protocol implementations MUST be prepared to
          handle both local and remote errors on any request.

          Upper Layer Protocol implementations MUST be prepared that
          certain errors will be returned by operations subsequent to
          the operation that encountered them.  In this case, unsignaled
          operations MAY be left in an indeterminate state.  As well,
          the ROI implementation MAY have terminated the RDMA or DDP
          Stream.

3.6.  Security Requirements

     The following are requirements relevant to security.

          The ROI protocols MUST be compatible with and be able to
          employ existing Internet security.

          The ROI protocols are NOT REQUIRED to establish the security
          association between the Remote Peer and Local Peer.

          The ROI protocols MUST rely upon supported IP transport Lower
          Layer Protocol implementations to support at least the
          following security properties.

               Integrity

               Encryption

               Authentication

               Confidentiality

          The ROI protocols MUST address the security issues inherent in
          the Advertisement of Memory Regions, especially as they will
          allow or prevent access within the scope of a single RDMA
          Stream.

          The ROI protocols MUST NOT permit ULPs or applications to
          access memory which has not explicitly been advertised to them
          by the Remote Peer.

          The ROI protocols MUST require implementations to enforce all
          supported protection attributes for Memory Regions.

          The ROI protocols MUST specify the protected scope of
          Advertised Memory Regions across all Remote Peers and all DDP
          Streams.




Talpey, et al             Expires November 2002                [Page 19]


Internet-Draft          RDMA over IP Requirements               May 2002


4.  Acknowledgements

     The authors gratefully acknowledge the previous work and valuable
     advice of Steph Bailey, David Black, Jeff Chase, Jeff Mogul, Jim
     Pinkerton, Renato Recio, Allyn Romanow and Costa Sapuntzakis, as
     well as the many others participating in the RDMA discussion to
     date.

5.  References

     [ROM]
          Allyn Romanow, Jeff Mogul, Tom Talpey, Steph Bailey, "RMDA
          over IP Problem Statement", Work In Progress,
          http://www.ietf.org/internet-drafts/draft-romanow-rdma-over-
          ip-problem-statment.txt

     [HP97]
          J. L. Hennessy, D. A. Patterson, Computer Organization and
          Design, 2nd Edition, San Francisco: Morgan Kaufmann
          Publishers, 1997

     [STREAM]
          The STREAM Benchmark Reference Information,
          http://www.cs.virginia.edu/stream/

     [SCTP]
          R. Stewart et al., "Stream Transmission Control Protocol",
          Standards Track RFC, http://www.ietf.org/rfc/rfc2960

     [ISCSI1]
          iSCSI Requirements, Informational Work In Progress,
          http://www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-
          reqmts-06.txt

     [ISCSI2]
          iSCSI Specification, Standards Track Work In Progress,
          http://www.ietf.org/internet-drafts/draft-ietf-ips-
          iscsi-12.txt

     [MYR]
          Myrinet, http://www.myricom.com

     [DAFS]
          Direct Access File System, http://www.dafscollaborative.org
          http://www.ietf.org/internet-drafts/draft-wittle-dafs-00.txt

     [FIBRE]
          Fibre Channel Standard,



Talpey, et al             Expires November 2002                [Page 20]


Internet-Draft          RDMA over IP Requirements               May 2002


          http://www.fibrechannel.com/technology/index.master.html

     [IB] InfiniBand Architecture Specification, Volumes 1 and 2,
          Release 1.0.a.  http://www.infinibandta.org

     [SDP]
          Sockets Direct Protocol, http://www.infinibandta.org

     [SVR]
          Compaq Servernet,
          http://nonstop.compaq.com/view.asp?PAGE=ServerNet

     [VI] Virtual Interface Architecture Specification Version 1.0,
          http://www.viarch.org/html/collateral/san_10.pdf

     [CONG]
          S. Floyd, "Congestion Control Principles", Best Current
          Practice, http://www.ietf.org/rfc/rfc2914.txt

     [RFCTERMS]
          S. Bradner, "Key words for use in RFCs to Indicate Requirement
          Levels", Best Current Practice,
          http://www.ietf.org/rfc/rfc2119.txt

     [MIDTAX]
          B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues",
          Informational RFC, http://www.ietf.org/rfc/rfc3234.txt

Authors' Addresses


     David Robinson
     Sun Microsystems, Inc.
     901 San Antonio Road
     Palo Alto, CA 94303 USA

     Phone: +1 512 401-1757
     EMail: david.robinson@sun.com


     Tom Talpey
     Network Appliance
     375 Totten Pond Road
     Waltham, MA 02451 USA

     Phone: +1 781 768-5329
     EMail: thomas.talpey@netapp.com




Talpey, et al             Expires November 2002                [Page 21]


Internet-Draft          RDMA over IP Requirements               May 2002


     Robert R. Teisberg
     Hewlett Packard Corporation
     14231 Tandem Blvd.
     Austin, TX 78728 USA

     Phone: +1 512 432-8119
     EMail: robert.teisberg@hp.com


     Jim Wendt
     Hewlett Packard Corporation
     8000 Foothills Boulevard
     Roseville, CA 95747-5668 USA

     Phone:  +1 916 785-5198
     EMail:  jim_wendt@hp.com


Full Copyright Statement

     Copyright (C) The Internet Society (2002). All Rights Reserved.

     This document and translations of it may be copied and furnished to
     others, and derivative works that comment on or otherwise explain
     it or assist in its implementation may be prepared, copied,
     published and distributed, in whole or in part, without restriction
     of any kind, provided that the above copyright notice and this
     paragraph are included on all such copies and derivative works.
     However, this document itself may not be modified in any way, such
     as by removing the copyright notice or references to the Internet
     Society or other Internet organizations, except as needed for the
     purpose of developing Internet standards in which case the
     procedures for copyrights defined in the Internet Standards process
     must be followed, or as required to translate it into languages
     other than English.

     The limited permissions granted above are perpetual and will not be
     revoked by the Internet Society or its successors or assigns.

     This document and the information contained herein is provided on
     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
     IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
     THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.






Talpey, et al             Expires November 2002                [Page 22]