INTERNET DRAFT                                     Uri Elzur
draft-elzur-iwarp-mpa-tcp-analysis-00.txt            Broadcom
Expires: July, 2003                                Bob Teisberg
                                                   Dwight Barron
                                                   Paul Culley
                                                     Hewlett-Packard
                                                   Jim Pinkerton
                                                     Microsoft
                                                   John Carrier
                                                     Adaptec
                                                   February 2003

                    Analysis of MPA over TCP Operations

1  Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

2  Abstract

   Further explanation and analysis of architectural recommendations
   contained in the recent Internet-Draft, "Marker PDU Aligned Framing
   for TCP Specification" [MPA], is provided.  The impact of the
   following three attributes of MPA over TCP is examined:

   *  packing of multiple DDP ULPDUs into one TCP segment;

   *  simplifying the receiver due to transmitter alignment of DDP
      headers with TCP headers; and

   *  mistakenly attempting to interoperate an MPA-enabled endpoint
      with a non-MPA-enabled endpoint.



Elzur, et al.            Expires - August 2003                 [Page 1]


                       Analysis of MPA over TCP           February 2003


   Table of Contents

   1    Status of this Memo.........................................1
   2    Abstract....................................................1
   3    Introduction................................................3
   4    Definitions.................................................4
   5    Assumptions.................................................6
   5.1  MPA is layered beneath DDP [DDP]............................6
   5.2  MPA preserves DDP message framing...........................6
   5.3  The size of the ULPDU passed to MPA is less than EMSS under
        normal conditions...........................................6
   5.4  Out-of-order placement but NO out-of-order delivery.........6
   6    No Packing..................................................7
   7    The Value of Header Alignment..............................11
   7.1  Impact of lack of Header Alignment on the receiver
        computational load and complexity..........................12
   7.2  Header Alignment effects on TCP wire protocol..............16
   8    Interoperating between MPA applications and non-MPA
        applications...............................................19
   8.1  Negotiation of MPA-enabled mode............................20
   8.2  Analysis of existing TCP services..........................23
   8.2.1  The "Little" TCP Services................................23
   8.2.2  ULPs Using Only Text Messages............................24
   8.2.3  ULPs with Fixed Initial Message..........................25
   8.2.4  Protocols with framed command headers....................25
   9    Security Considerations....................................30
   10   IANA Considerations........................................31
   11   References.................................................32
   11.1   Primary References.......................................32
   11.2   "Little" TCP Services....................................32
   11.3   ULPs using only Text Messages............................33
   11.4   ULPs with Fixed Initial Message..........................34
   11.5   ULPs with Framed Command Headers.........................34
   12   Author's Addresses.........................................36
   13   Acknowledgments............................................37
   14   Full Copyright Statement...................................38


   Table of Figures

   Figure 1: Non-aligned FPDU freely placed in TCP octet stream....13
   Figure 2: Aligned FPDU placed immediately after TCP header......15
   Figure 3: MPA Enablement Error..................................19
   Figure 4: MPA Transition via Well Known Port or Service Location
             Protocol..............................................21
   Figure 5: MPA Transition via Octet Stream Negotiation...........22
   Figure 6: Effect of Improperly MPA-Enabled DNS Resolver.........27




Elzur, et al.            Expires - August 2003                 [Page 2]


                       Analysis of MPA over TCP           February 2003


3  Introduction

   This paper analyzes the impact of MPA (Marker PDU Aligned Framing
   for TCP [MPA]) on the TCP sender, receiver, and wire protocol.

   One of MPA's high level goals is to provide enough information, when
   combined with the Direct Data Placement Protocol [DDP], to enable
   out-of-order placement of DDP payload into the final Upper Layer
   Protocol (ULP) buffer. Note that DDP separates the act of placing
   data into a ULP buffer from that of notifying the ULP that the ULP
   buffer is available for use. In DDP terminology, the former is
   defined as "Placement", and the later is defined as "Delivery". MPA
   supports in-order delivery of the data to the ULP, including support
   for direct data placementin the final ULP buffer location when TCP
   segments arrive out-of-order. Effectively, the goal is to use the
   pre-posted ULP buffers as the TCP receive buffer, where the
   reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and
   DDP) is done in place, in the ULP buffer, with no data copies.

   The paper walks through the advantages and disadvantages of the two
   main TCP sender modifications proposed by MPA:

   1) that MPA require the TCP sender to do "Header Alignment", where a
   TCP segment is required to begin with an MPA Framing Protocol Data
   Unit (FPDU) (if there is payload present) and that there be an
   integral number of FPDUs in a TCP segment (under conditions where
   the Path MTU is not changing).

   2) that MPA require "no packing" of FPDUs -- i.e. exactly zero or
   one FPDUs are present in a single TCP segment.

   The paper concludes that the worst case analysis for "no packing" is
   bad enough that it out-weighs the advantages and should be removed
   from the MPA specification.

   The paper also concludes that the scaling advantages of Header
   Alignment are strong, based primarily on fairly drastic TCP receive
   buffer reduction requirements and simplified receive handling. The
   analysis also shows that there is little effect to TCP wire
   behavior.

   Finally, the paper examines interoperability issues between an
   unmodified TCP stack and a modified TCP stack, for a wide variety of
   applications and a wide variety of combinations.







Elzur, et al.            Expires - August 2003                 [Page 3]


                       Analysis of MPA over TCP           February 2003


4  Definitions

   DDP - Direct Data Placement Protocol [DDP]

   Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined
       as the process of informing the ULP or consumer that a
       particular Message is available for use.  This is specifically
       different from "Placement", which may generally occur in any
       order, while the order of "Delivery" is strictly defined. See
       "Data Placement".

   Data Placement (Placement, Placed, Places) - For DDP, this term is
       specifically used to indicate the process of writing to a data
       buffer by a DDP implementation.  DDP Segments carry Placement
       information, which may be used by the receiving DDP
       implementation to perform Data Placement of the DDP Segment ULP
       Payload. See "Data Delivery".

   EMSS - Effective Maximum Segment Size.  EMSS is the smaller of the
       TCP maximum segment size (MSS) [RFC0793], and the current Path
       Maximum Transfer Unit (PMTU) [RFC1191].

   FPDU - Framing Protocol Data Unit.  The unit of data created by a
       ULP utilizing the MPA framing protocol. A complete MPA FPDU
       includes the MPA length, MPA payload, MPA CRC and potentially
       Markers as appropriate.

   Header Alignment  - the property that a TCP segment begins with an
       FPDU and the TCP segment includes an integer number of FPDUs.

   MPA - the protocol defined by the "Marker PDU Aligned Framing for
       TCP Specification" [MPA].

   MPA-aware TCP - a TCP implementation that is aware of the receiver
       efficiencies of MPA Header Alignment and is capable of sending
       TCP segments that begin with an FPDU.

   MPA-enabled -  MPA is enabled if the MPA protocol is visible on the
       wire.  When the sender is MPA-enabled, it is inserting framing
       and markers.  When the receiver is MPA-enabled, it is
       interpreting framing and markers.

   MULPDU - Maximum ULPDU. The current maximum size of the record that
       is acceptable for DDP to pass to MPA for transmission.

   PDU - protocol data unit





Elzur, et al.            Expires - August 2003                 [Page 4]


                       Analysis of MPA over TCP           February 2003


   ULP - Upper Layer Protocol. The protocol layer above the protocol
       layer currently being referenced. The ULP for MPA is DDP [DDP].
       ULPs may be classified as Passive if they are awaiting a
       connection request or Active if they are initiating a connection
       request.

   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
       the layer above MPA (DDP).  ULPDU corresponds to DDP's "DDP
       Segment".










































Elzur, et al.            Expires - August 2003                 [Page 5]


                       Analysis of MPA over TCP           February 2003


5  Assumptions

5.1  MPA is layered beneath DDP [DDP]

   MPA is an adaptation layer between DDP and TCP.  DDP requires
   preservation of DDP segment boundaries and a CRC32C digest covering
   the DDP header and data.   MPA adds these features to the TCP stream
   so that DDP over TCP has the same basic properties as DDP over SCTP.

5.2  MPA preserves DDP message framing

   MPA was designed as a framing layer specifically for DDP and was not
   intended as a general-purpose framing layer for any other ULP using
   TCP.

   A framing layer allows ULPs using it to receive indications from the
   transport layer only when complete ULPDUs are present.  As a framing
   layer, MPA is not aware of the content of the DDP PDU, only that it
   has received and, if necessary, reassembled a complete PDU for
   delivery to the DDP.

5.3  The size of the ULPDU passed to MPA is less than EMSS under normal
     conditions

   To make reception of a complete DDP PDU on every received segment
   possible, DDP passes to MPA a PDU that is no larger than the EMSS of
   the underlying fabric. Each FPDU that MPA creates contains
   sufficient information for the receiver to directly place the ULP
   payload in the correct location in the correct receive buffer.

   Edge cases when this condition does not occur are dealt with, but do
   not need to be on the fast path

5.4  Out-of-order placement but NO out-of-order delivery

   DDP receives complete DDP PDUs from MPA.  Each DDP PDU contains the
   information necessary to place its ULP payload directly in the
   correct location in host memory.

   Because each DDP segment is self-describing, it is possible for DDP
   segments received out of order to have their ULP payload placed
   immediately in the ULP receive buffer.

   Data delivery to the ULP is guaranteed to be in the order the data
   was sent.  DDP only indicates data delivery to the ULP after TCP has
   acknowledged the complete byte stream.





Elzur, et al.            Expires - August 2003                 [Page 6]


                       Analysis of MPA over TCP           February 2003


6  No Packing

   MPA as originally proposed requires an MPA-aware TCP sender to
   segment the datastream in such a way that each TCP segment contains
   a single FPDU.  This requirement is referred to in this document as
   the "no packing" rule.  Let us examine the costs and benefits of the
   "no packing" rule.

   The Header Alignment rule means that Placement information is
   guaranteed to immediately follow the TCP header in the typical case
   (See Section 7, The Value of Header Alignment, page 11, for the
   analysis of the value of Header Alignment). If the "no packing" rule
   is in effect, it further guarantees (in the typical case) that
   because no additional FPDUs will be in the TCP payload, the receiver
   does not have to look for additional placement information within
   the TCP payload. This allows the receiver logic to be simplified.
   For instance, the receiver logic needs to support one context lookup
   per frame and one data movement operation per frame. Only in the
   instance of a PMTU change is it necessary to examine the remainder
   of the TCP payload for DDP headers.  Because PMTU changes are
   presumed to be rare, the latter case can be delegated to a "slow
   path" processing mode, while the "fast path" for the common case can
   be extremely simple.

   The original argument for "no packing" also examined typical ULP
   behavior for applications expected to see strong advantages from
   Direct Data Placement -- specifically transaction based applications
   or throughput oriented applications. Request/response protocols
   typically send one FPDU per TCP segment. A response may be short or
   quite long, but in any case would fill all TCP segments up to the
   last one, providing TCP segmentation behavior similar to an
   unmodified TCP stack. A similar argument applies to ULPs optimized
   for throughput, which send long, uninterrupted sequences of PMTU-
   sized FPDUs.

   Thus for many applications the rule has no effect on TCP
   segmentation, and it enabled simplified receive logic because the
   receiver did not have to peak into the TCP segment at some arbitrary
   offset to find the MPA/DDP headers.

   On the other hand, a ULP which sends long sequences of small FPDUs
   is strongly affected by the "no packing" rule.

   Several specific consequences of the "no packing" rule deserve
   detailed discussion.

   The "no packing" rule tends to increase the total number of TCP
   segments transmitted.  In the best case, as noted earlier, the
   number of segments is unchanged.  In the worst case, where all


Elzur, et al.            Expires - August 2003                 [Page 7]


                       Analysis of MPA over TCP           February 2003


   ULPDUs are 1 octet long, the number of segments increases by dozens
   of times.  We will show the penalty for the "no packing" rule for
   networks using the full Ethernet frames or the smaller default IP
   frame size.

   For both calculations, the Minimum FPDU Size (MinFPDUSize) for the
   worst case ULPDU is the sum of the MPA Header Size (MPAHdrSize), the
   DDP Header Size (DDPHdrSize), the worst case DDP Payload Size
   (DDPPayLd), the number of MPA Pad octets (MPAPad) needed to make the
   FPDU size a multiple of 4, and the MPA CRC size (MPACRC):

           MPAHdrSize   = 2 octets

           DDPHdrSize   = 14 octets

           DDPPayld     = 1 octet

           MPAPad       = 3 octets

           MPACRC       = 4 octets

           MinFPDUSize  = MPAHdrSize + DDPHdrSize + DDPPayld + MPAPad
                         + MPACRC

                        = 2 + 14 + 1 + 3 + 4

                        = 24

   1.  Ethernet frames

       a.  The expected Number of Markers in an Ethernet frame
           (EthNMarkers) is calculated by dividing the MPA Marker
           Interval (MPAMrkIntvl) into the Ethernet Frame Size
           (EthFrmSize):

           MPAMrkIntvl  = 512 octets

           EthFrmSize   = 1460 octets of Ethernet payload

           EthNMarkers  =~ EthFrmSize / MPAMrkIntvl

                        =~ 2.9









Elzur, et al.            Expires - August 2003                 [Page 8]


                       Analysis of MPA over TCP           February 2003


       b.  The expansion of FPDUs into an Ethernet frame (EthExpansion)
           is the number of times MinFPDUSize octets can be put into an
           Ethernet Frame (EthFrmSize) after removing the number of
           octets consumed by markers (MPAMrkSize * EthNMarkers) in the
           frame:

           MPAMrkSize   = 4 octets

           EthExpansion = (EthFrmSize - MPAMrkSize * EthNMarkers) /
                           MinFPDUSize

                        =~ (1460 - 4 * 2.9) / 24

                        =~ 60

   2.  Default IP packets

       a.  The expected Number of Markers in the Default IP packet
           (DefNMarkers) is calculated by dividing the MPA Marker
           Interval (MPAMrkIntvl) into the Default IP Packet Size
           (DefPktSize):

           DefPktSize   = 536 octets

           DefNMarkers  = DefPktSize / MPAMrkIntvl

                        =~ 1.05

       b.  The expansion of FPDUs into the Default IP Packet
           (DefExpansion) is the number of times MinFPDUSize octets can
           be put into an IP Packet (DefPktSize) after removing the
           number of octets consumed by markers (MPAMrkSize *
           DefNMarkers) in the packet:

           DefExpansion = (DefPktSize - MPAMrkSize * DefNMarkers) /
                           MinFPDUSize

                        =~ (536 - 4 * 1.05) / 24

                        =~ 22

   In the worst case where all ULPDU's are one octet long, the "no
   packing" rule forces the transmission of roughly 60 times as many
   packets on Ethernet as MPA with packing allowed.  Even with smaller
   IP packets, the "no packing" rule would force using more than 20
   times as many packets.





Elzur, et al.            Expires - August 2003                 [Page 9]


                       Analysis of MPA over TCP           February 2003


   Clearly the effect of the "no packing" rule on the number of extra
   packets depends on the nature of the ULP and workload.  The worst
   case involves protocols that send long sequences of small ULPDUs.
   The existence of protocols such as telnet [RFC0854] shows that at
   least in some cases real-world applications may tend to approach the
   worst case.

   As a direct consequence of the increased number of data segments
   transmitted, the number of TCP ACKs increases proportionally.  A
   series of minimum-sized ULPDUs which could have been packed into two
   TCP segments on an Ethernet network, prompting a single ACK in
   response, would consume 120 segments with the "no packing" rule in
   effect, resulting in as many as 60 ACKs.

   Dividing a datastream into a large number of small segments impairs
   the efficiency of the slow start algorithm.  While the number of
   packets necessary to reach an efficient line utilization is the same
   as in a conventional TCP implementation, the total payload
   transmitted during slow start is reduced substantially.  In other
   words, a sender obeying the "no packing" rule could pay a
   substantial performance penalty during the slow start phase if long
   sequences of small ULPDUs are used.

   It is clear from this analysis that the drawbacks of the "no
   packing" rule are substantial, but the benefits are small.  The "no
   packing" requirement should be changed to state that MPA-aware TCP MAY
   support packing at the transmitter and MUST support packing at the receiver.
   However, as noted above, certain applications (e.g., transaction-
   based applications) will prioritize minimal latency over maximum
   wire efficiency. In such scenarios it is anticipated there will be
   minimal opportunity for packing at the transmitter, and receivers
   may choose to optimize their performance for this anticipated
   behavior.


















Elzur, et al.            Expires - August 2003                [Page 10]


                       Analysis of MPA over TCP           February 2003


7  The Value of Header Alignment

   Significant receiver optimizations can be achieved when Header
   Alignment and complete FPDUs are the common case. The optimizations
   allow utilizing significantly fewer buffers on the receiver and less
   computation per FPDU. The net effect is the ability to build a
   "Flow-Through" receiver that enables TCP-based solutions to scale to
   10G and beyond in an economical way. The optimizations are
   especially relevant to hardware implementations of receivers that
   process multiple protocol layers - Data Link Layer (e.g., Ethernet),
   Network and Transport Layer (e.g., TCP/IP), and even some ULP on top
   of TCP (e.g., MPA/DDP). As network speed increases, there is an
   increasing desire to use a hardware based receiver in order to
   achieve an efficient high performance solution.

   A TCP receiver, under worst case conditions, has to allocate buffers
   (BufferSizeTCP) whose capacities are a function of the bandwidth-
   delay product. Thus:

        BufferSizeTCP = K * bandwidth [octets/S] * Delay [S].

   Where bandwidth is the end-to-end bandwidth of the connection, delay
   is the round trip delay of the connection, and K is an
   implementation dependent constant.

   Thus BufferSizeTCP scales with the end-to-end bandwidth (10x more
   buffers for a 10x increase in end-to-end bandwidth). As this
   buffering approach may scale poorly for hardware or software
   implementations alike, several approaches allow reduction in the
   amount of buffering required for high-speed TCP communication.

   The MPA/DDP approach is to enable the ULP's buffer to be used as the
   TCP receive buffer. If the application pre-posts a sufficient amount
   of buffering, and each TCP segment has sufficient information to
   place the payload into the right application buffer, when an out-of-
   order TCP segment arrives it could potentially be placed directly in
   the ULP buffer. However, placement can only be done when a complete
   FPDU with the placement information is available to the receiver,
   and the FPDU contents contain enough information to place the data
   into the correct ULP buffer (e.g., there is a DDP header available).

   For the case when the FPDU is not aligned with the TCP segment, it
   may take, on average, 2 TCP segments to assemble one FPDU.
   Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size,
   Non-Aligned FPDU) octets:

       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS




Elzur, et al.            Expires - August 2003                [Page 11]


                       Analysis of MPA over TCP           February 2003


   Where K1 and K2 are implementation dependent constants and EMSS is
   the effective maximum segment size.

   For example, a 1 Gbps link with 10,000 connections and an EMSS of
   1500B would require 15 MB of memory. Often the number of connections
   used scales with the network speed, aggravating the situation for
   higher speeds.

   A Header Aligned FPDU would allow the receiver to allocate
   BufferSizeAF (Buffer Size, Aligned FPDU) octets:

       BufferSizeAF = K2 * EMSS

   for the same conditions. A Header Aligned receiver may require
   memory in the range of ~100s of KB - which is feasible for an on-
   chip memory and enables a "Flow-Through" design, in which the data
   flows through the NIC and is placed directly in the destination
   buffer. Assuming most of the connections support Header Alignment,
   the receiver buffers no longer scale with number of connections.

   Additional optimizations can be achieved in a balanced I/O sub-
   system -- where the system interface of the network controller
   provides ample bandwidth as compared with the network bandwidth. For
   almost twenty years this has been the case and the trend is expected
   to continue - while Ethernet speeds have scaled by 1000 (from 10
   megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU
   architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to
   PCI-X DDR). Under these conditions, the Header Aligned FPDU approach
   allows BufferSizeAF to be indifferent to network speed. It is
   primarily a function of the local processing time for a given frame.
   Thus when the Header Aligned FPDU approach is used, receive
   buffering is expected to scale gracefully (i.e. less than linear
   scaling) as network speed is increased.



7.1  Impact of lack of Header Alignment on the receiver computational
     load and complexity

   The receiver must perform IP and TCP processing, and then perform
   FPDU CRC checks, before it can trust the FPDU header placement
   information. For simplicity of the description, the assumption is
   that a FPDU is carried in no more than 2 TCP segments. In reality,
   with no Header Alignment, an FPDU can be carried by more than 2 TCP
   segments (e.g., if the PMTU was reduced).






Elzur, et al.            Expires - August 2003                [Page 12]


                       Analysis of MPA over TCP           February 2003


   ----++-----------------------------++-----------------------++-----
   +---||---------------+    +--------||--------+   +----------||----+
   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   |
   +---||---------------+    +--------||--------+   +----------||----+
   ----++-----------------------------++-----------------------++-----
                   FPDU #N-1                  FPDU #N

       Figure 1: Non-aligned FPDU freely placed in TCP octet stream

   The receiver algorithm for processing TCP segments (e.g., TCP
   segment #X in Figure 1: Non-aligned FPDU freely placed in TCP octet
   stream) carrying non-aligned FPDUs (in-order or out-of-order)
   includes:



   1.  Data Link Layer processing (whole frame) - typically including a
       CRC calculation.

   2.  Network Layer processing (assuming not an IP fragment, the whole
       Data Link Layer frame contains one IP datagram. IP fragments
       should be reassembled in a local buffer. This is not a
       performance optimization goal)

   3.  Transport Layer processing -- TCP protocol processing, header
       and checksum checks.

       a.  Classify incoming TCP segment using the 5 tuple (IP SRC, IP
           DST, TCP SRC Port, TCP DST Port, protocol)

   4.  Find FPDU message boundaries.

       a.  Get MPA state information for the connection

           i.  If the TCP segment is in-order, use the receiver managed
               MPA state information to calculate where the previous
               FPDU message (#N-1) ends in the current TCP segment X.
               (previously, when the MPA receiver processed the first
               part of FPDU #N-1, it calculated the number of bytes
               remaining to complete FPDU #N-1 by using the MPA Length
               field).

               .1. Get the stored partial CRC for FPDU #N-1

               .2. Complete CRC calculation for FPDU #N-1 data (first
                   portion of TCP segment #X)

               .3. Check CRC calculation for FPDU #N-1



Elzur, et al.            Expires - August 2003                [Page 13]


                       Analysis of MPA over TCP           February 2003


               .4. If no FPDU CRC errors, placement is allowed

               .5. Locate the local buffer for the first portion of
                   FPDU#N-1, CopyData(local buffer of first portion of
                   FPDU #N-1, host buffer address, length)

               .6. Compute host buffer address for second portion of
                   FPDU #N-1

               .7. CopyData (local buffer of second portion of FPDU #N-
                   1, host buffer address for second portion, length)

               .8. Calculate the octet offset into the TCP segment for
                   the next FPDU #N.

               .9. Start Calculation of CRC for available data for FPDU
                   #N

               .10. Store partial CRC results for FPDU #N

               .11. Store local buffer address of first portion of FPDU
                   #N

               .12. No further action is possible on FPDU #N, before it
                   is completely received

           ii. If TCP out-of-order, receiver must buffer the data until
               at least one complete FPDU is received. Typically
               buffering for more than one TCP segment per connection
               is required. Use the MPA based Markers to calculate
               where FPDU boundaries are.

               .1. When a complete FPDU is available, a similar
                   procedure to the in-order algorithm above is used.
                   There is additional complexity, though, because when
                   the missing segment arrives, this TCP segment must
                   be run through the CRC engine after the CRC is
                   calculated for the missing segment.

   If we assume Header Alignment, the following diagram and the
   algorithm below apply. Note that when using MPA, the receiver is
   assumed to actively detect presence or loss of Header Alignment for
   every TCP segment received.








Elzur, et al.            Expires - August 2003                [Page 14]


                       Analysis of MPA over TCP           February 2003


      +--------------------------+      +--------------------------+
   +--|--------------------------+   +--|--------------------------+
   |  |       TCP Seg X          |   |  |         TCP Seg X+1      |
   +--|--------------------------+   +--|--------------------------+
      +--------------------------+      +--------------------------+
                FPDU #N                          FPDU #N+1

        Figure 2: Aligned FPDU placed immediately after TCP header

   The receiver algorithm for Header Aligned frames (in-order or out-
   of-order) includes:



   1.  Data Link Layer processing (whole frame) - typically including a
       CRC calculation.

   2.  Network Layer processing (assuming not an IP fragment, the whole
       Data Link Layer frame contains one IP datagram. IP fragments
       should be reassembled in a local buffer. This is not a
       performance optimization goal)

   3.  Transport Layer processing -- TCP protocol processing, header
       and checksum checks.

       a.  Classify incoming TCP segment using the 5 tuple (IP SRC, IP
           DST, TCP SRC Port, TCP DST Port, protocol)

   4.  Check for Header Alignment. (Described in detail in [MPA]
       section 7.4). Assuming Header Alignment for the rest of the
       algorithm below.

       a.  If the header is not aligned, see the algorithm defined in
           the prior section.

   5.  If TCP is in-order or out-of-order the MPA header is at the
       beginning of the current TCP payload. Get the FPDU length from
       the FPDU header.

   6.  Calculate CRC over FPDU

   7.  Check CRC calculation for FPDU #N

   8.  If no FPDU CRC errors, placement is allowed

   9.  CopyData(TCP segment #X, host buffer address, length)

   10. Loop to #5 until all the FPDUs in the TCP segment are consumed
       in order to handle FPDU packing (see section 6).


Elzur, et al.            Expires - August 2003                [Page 15]


                       Analysis of MPA over TCP           February 2003


   Implementation note: In both cases the receiver has to classify the
   incoming TCP segment and associate it with one of the flows it
   maintains. In the case of no Header Alignment, the receiver is
   forced to classify incoming traffic before it can calculate the FPDU
   CRC. In the case of Header Alignment the operations order is left to
   the implementor.

   The Header Aligned receiver algorithm is significantly simpler.
   There is no need to locally buffer portions of FPDUs. Accessing
   state information is also substantially simplified -  the normal
   case does not require retrieving information to find out where a
   FPDU starts and ends or retrieval of a partial CRC before the CRC
   calculation can commence. This avoids adding internal latencies,
   having multiple data passes through the CRC machine, or scheduling
   multiple commands for moving the data to the host buffer.

   The aligned FPDU approach is useful for in-order and out-of-order
   reception. The receiver can use the same mechanisms for data storage
   in both cases, and only needs to account for when all the TCP segments have
   arrived to enable delivery. . The Header Alignment, along with the high
   probability that at least one complete FPDU is found with every TCP
   segment, allows the receiver to perform data placement for out-of-
   order TCP segments with no need for intermediate buffering.
   Essentially the TCP receive buffer has been eliminated and TCP
   reassembly is done in place within the ULP buffer.

   In case Header Alignment is not found, the receiver should follow
   the algorithm for non aligned FPDU reception which may be slower and
   less efficient.

7.2  Header Alignment effects on TCP wire protocol

   An MPA-aware TCP exposes its EMSS to MPA.  MPA uses the EMSS to
   calculate its MULPDU, which it then exposes to DDP, its ULP.  DDP
   uses the MULPDU to segment its payload so that each FPDU sent by MPA
   fits completely into one TCP segment. This has no impact on wire
   protocol and exposing this information is already supported on many
   TCP implementations, including all modern flavors of BSD networking,
   through the TCP_MAXSEG socket option.

   In the common case, the ULP (i.e. DDP over MPA) messages provided to
   the TCP layer are segmented to MULPDU size. It is assumed that the
   ULP message size is bounded by MULPDU, such that a single ULP
   message can be encapsulated in a single TCP segment. Therefore, in
   the common case, there is no increase in the number of TCP segments
   emitted. For smaller ULP messages, the sender can also apply
   packing, i.e. the sender packs as many complete FPDUs as possible
   into one TCP segment (See Section 6, No Packing, on page 7). The
   requirement to always have a complete FPDU may increase the number


Elzur, et al.            Expires - August 2003                [Page 16]


                       Analysis of MPA over TCP           February 2003


   of TCP segments emitted. Typically, a ULP message size varies from
   few bytes to multiple EMSS (e.g., 64 Kbytes). In some cases the ULP
   may post more than one message at the time for transmission, giving
   the sender an opportunity for packing. In the case where more than
   one FPDU is available for transmission and the FPDUs are
   encapsulated into a TCP segment and there is no room in the TCP
   segment to include the next complete FPDU, another TCP segment is
   sent. In this corner case some of the TCP segments are not full
   size. In the  worst case scenario, the ULP may choose  a FPDU size
   that is EMSS/2 +1 and has multiple messages available for
   transmission. For this poor choice of FPDU size,  the average TCP
   segment size is therefore about 1/2 of the EMSS and the number of
   TCP segments emitted is approaching 2x of what is possible without
   the requirement to encapsulate an integer number of complete FPDUs
   in every TCP segment. This is a dynamic situation that only lasts
   for the duration where the sender ULP has multiple non-optimal
   messages for transmission and this causes a minor impact on the wire
   utilization.

   However, it is not expected that requiring Header Alignment will
   have a measurable impact on wire behavior of most applications.
   Throughput applications with large I/Os are expected to take full
   advantage of the EMSS.  Another class of applications with many
   small outstanding buffers (as compared to EMSS) is expected to use
   packing when applicable. Transaction oriented applications are also
   optimal.

   TCP retransmission is another area that can affect sender behavior.
   TCP supports retransmission of the exact, originally transmitted
   segment (see [RFC0793] section 2.6, [RFC0793] section 3.7 "managing
   the window" and [RFC1122] section 4.2.2.15 ). In the unlikely event
   that part of the original segment has been received and acknowledged
   by the remote peer (e.g., a resegmenting middlebox, as documented in
   [MPA]), a better available bandwidth utilization may be possible by
   re-transmitting only the missing octets. If an MPA-aware TCP
   retransmits complete FPDUs, there may be some marginal bandwidth
   loss.

   Another area where a change in the TCP segment number may have
   impact is that of Slow Start and Congestion Avoidance. Slow-start
   exponential increase is measured in segments per second, as the
   algorithm focuses on the overhead per segment at the source for
   congestion that eventually results in dropped segments. Slow-start
   exponential bandwidth growth for MPA-aware TCP is similar to any TCP
   implementation. Congestion Avoidance allows for a linear growth in
   available bandwidth when recovering after a packet drop. Similar to
   the analysis for slow-start, MPA-aware TCP doesn't change the
   behavior of the algorithm. Therefore the average size of the segment
   versus EMSS is not a major factor in the assessment of the bandwidth


Elzur, et al.            Expires - August 2003                [Page 17]


                       Analysis of MPA over TCP           February 2003


   growth for a sender. Both Slow Start and Congestion Avoidance for an
   MPA-aware TCP will behave similarly to any TCP sender and allow an
   MPA-aware TCP to enjoy the theoretical performance limits of the
   algorithms.

   In summary, the ULP messages generated at the sender (e.g., the
   amount of messages grouped for every transmission request) and
   message size distribution has the most significant impact over the
   number of TCP segments emitted. The worst case effect for certain
   ULPs (with average message size of EMSS/2+1 to EMSS), is bounded by
   an increase of up to 2x in the number of TCP segments and
   acknowledges.  In reality the effect is expected to be marginal.

   See the MPA specification for additional documentation on corner
   cases which are expected to lose Header Alignment and cause the
   previously documented algorithm to be executed.



































Elzur, et al.            Expires - August 2003                [Page 18]


                       Analysis of MPA over TCP           February 2003


8  Interoperating between MPA applications and non-MPA applications

   ULPs that use MPA are required to enable MPA at an agreed-upon point
   in the TCP datastream.  If they fail to do so, the condition
   illustrated in Figure 3: MPA Enablement Error arises.  This
   condition is referred to as an MPA enablement error.  With the
   understanding that MPA enablement errors should not occur, some
   concerns have been raised about their effects if they do.  The
   remainder of this section addresses those concerns for a variety of
   ULPs.

   MPA is enabled if the MPA protocol is visible on the wire.  When the
   sender is MPA-enabled, it is inserting framing and markers.  When
   the receiver is MPA-enabled, it is interpreting framing and markers.
   MPA enablement is orthogonal to MPA awareness.  MPA can be enabled
   on a strictly layered MPA implementation running over a non-MPA-
   aware TCP.  It can be disabled on an MPA-aware TCP implementation.

   When first enabled, MPA always sends a marker preceding the first
   FPDU.  Because the marker is located on the boundary between FPDUs,
   its initial value is always 0.  Consequently, any ULP which never
   starts its datastream with four zero octets is easily proved safe
   with respect to MPA enablement errors.



                    Node A                     Node B
                 (MPA active)             (MPA not active)
              +---------------+          +---------------+
              |      ULP      |          |      ULP      |
              +---------------+          |               |
              |      MPA      |          |               |
              +---------------+          +---------------+
              |      TCP      |          |      TCP      |
              +---------------+          +---------------+
              |      IP       |          |      IP       |
              +---------------+          +---------------+
                      |                          |
                      +--------------------------+

                      Figure 3: MPA Enablement Error










Elzur, et al.            Expires - August 2003                [Page 19]


                       Analysis of MPA over TCP           February 2003


8.1  Negotiation of MPA-enabled mode

   Transition to MPA can be accomplished by three possible methods
   examined herein. The first two methods involve the use of specific
   TCP ports for ULPs using MPA, either by use of a well known IANA
   port or use of a service locator protocol. In these usage models it
   is anticipated that both the Active and Passive ULPs will enable MPA
   mode operation for their respective transmitters and receivers prior
   to any data exchange.  The third model is by negotiation of MPA
   transition by the ULP using octet stream messages to accomplish a
   ULP specific <MPA Hello>, <MPA Hello ACK>, <MPA ACK> three way
   exchange. Transition to MPA framing will cause the transmitter to
   always insert a 4 octet marker modulo the marker interval, and the
   receiver to check the MPA CRC. The first MPA marker is defined to
   have a value of 0, and it will always follow the last expected octet
   to be transferred in octet stream mode. This leads to two distinct
   error detection scenarios. Receivers that are expecting MPA framing
   will quickly detect a CRC error in addition to any ULP header errors
   (e.g., DDP) if given octet stream data from a non MPA-enabled
   transmitter. This will cause MPA to drop the connection. Receivers
   that are not expecting MPA framing will see four octets of zeros
   immediately in the octet stream at the point the transition to MPA
   was expected to occur.

   Anticipated error cases are examined in the following figures.
   Figure 4: MPA Transition via Well Known Port or Service Location
   Protocol deals with the cases where the transition to MPA is
   expected prior to data exchange and examines the four possible
   combinations of mismatching MPA-enabled endpoints with non-MPA-
   enabled endpoints. These error scenarios imply a configuration error
   between active and passive nodes and are likely to simultaneously
   represent a configuration error of the ULPs as well. Figure 5: MPA
   Transition via Octet Stream Negotiation examines cases where
   transition to MPA mode is by exchange of ULP specific MPA Hello/ACK
   messages.
















Elzur, et al.            Expires - August 2003                [Page 20]


                       Analysis of MPA over TCP           February 2003


+-------------+---------------------------+---------------------------+
|             | MPA-Enabled               | Non-MPA-Enabled           |
|             | Passive                   | Passive                   |
|             |                           |                           |
+-------------+---------------------------+---------------------------+
| MPA-Enabled | First octets transmitted  | Passive receiver does not |
| Active      | are MPA mode.             | understand MPA header,    |
|             |                           | which starts with 0.      |
|             | Receivers check for MPA   | See section 8.2 for       |
|             | framing.                  | anticipated behavior from |
|             |                           | existing protocols that   |
|             | Successful Transition to  | run over TCP.             |
|             | MPA mode.                 |                           |
|             |                           | Active receiver expects   |
|             |                           | MPA framing and CRC,      |
|             |                           | which will not be         |
|             |                           | present.  Expect quick    |
|             |                           | detection and closing     |
|             |                           | the connection in error.  |
|             |                           |                           |
+-------------+---------------------------+---------------------------+
| Non-        | First octets transmitted  | Normal octet stream mode  |
| MPA-Enabled | in octet streaming mode.  | operation could occur,    |
| Active      |                           | but this indicates a con- |
|             | Passive receiver expects  | figuration error that     |
|             | MPA framing and CRC,      | should be detected by the |
|             | which will not be present.| ULP at either node.       |
|             | Expect quick detection    |                           |
|             | and closing the connec-   |                           |
|             | tion in error.            |                           |
|             |                           |                           |
+-------------+---------------------------+---------------------------+

     Figure 4: MPA Transition via Well Known Port or Service Location
                                 Protocol
















Elzur, et al.            Expires - August 2003                [Page 21]


                       Analysis of MPA over TCP           February 2003


+-------------+----------------------------+--------------------------+
|             | MPA-Enabled                | Non-MPA-Enabled          |
|             | Passive                    | Passive                  |
|             |                            |                          |
+-------------+----------------------------+--------------------------+
| MPA-Enabled | Completes Hello, Hello     | Passive does not under-  |
| Active      | Ack exchange.              | stand MPA Hello sequence |
|             |                            | in octet-stream mode,    |
|             | Successful Transition      | either                   |
|             | to MPA mode.               |                          |
|             |                            | * Passive side ULP dis-  |
|             |                            |   connects due to ULP    |
|             |                            |   protocol error         |
|             |                            |                          |
|             |                            | * Active side ULP times  |
|             |                            |   out waiting for MPA    |
|             |                            |   Hello ACK.             |
|             |                            |                          |
+-------------+----------------------------+--------------------------+
| Non-        | Active side sends an octet | Normal octet-stream mode |
| MPA-Enabled | stream sequence that looks | operation                |
| Active      | like MPA Hello without     |                          |
|             | being aware of MPA.        |                          |
|             | Passive side ULP sees this |                          |
|             | message and mistakenly     |                          |
|             | tries to transition to MPA |                          |
|             | mode. This situation would |                          |
|             | most likely be the result  |                          |
|             | of a ULP protocol error.   |                          |
|             | Passive side will trans-   |                          |
|             | ition to MPA mode on the   |                          |
|             | receiver and transmit MPA  |                          |
|             | Hello Ack, resulting in    |                          |
|             | one of two error cases:    |                          |
|             |                            |                          |
|             | * MPA Hello Ack should     |                          |
|             |   cause a ULP protocol     |                          |
|             |   error at the Active      |                          |
|             |   side receiver            |                          |
|             |                            |                          |
|             | * Passive side receiver    |                          |
|             |   will expect MPA framing  |                          |
|             |   and CRC, which will      |                          |
|             |   fail very quickly        |                          |
|             |                            |                          |
+-------------+----------------------------+--------------------------+

           Figure 5: MPA Transition via Octet Stream Negotiation



Elzur, et al.            Expires - August 2003                [Page 22]


                       Analysis of MPA over TCP           February 2003


8.2  Analysis of existing TCP services

   The following sections examine the initial octet stream exchanges
   for many of the ULPs used over a TCP transport to determine behavior
   if they were inadvertently subjected to MPA framed messages.  It
   should be kept in mind that if the procedures described in the
   preceding section are followed, none of the conditions analyzed here
   are possible. If a ULP is determined to be unsafe with respect to
   MPA enablement errors, it means only that the ULP does not protect
   itself against connection attempts by clients using some other ULP.
   Any such protection is the responsibility of the ULP, not MPA.

8.2.1  The "Little" TCP Services

   There is a group of TCP services, collectively referred to as the
   "little" TCP services, which are occasionally useful for debugging
   networks.  Normally the "client" for these services is telnet, which
   simply sends all typed character and displays all received
   characters verbatim.  Most of the "little" services are easily
   proved safe.

   The echo [RFC0862] server copies all received data back to the
   client.  Its behavior is identical even if the echo client or server
   is accidentally MPA-enabled.

   The discard [RFC0863] server never sends anything and discards
   everything it receives.  Neither the client nor the server can
   detect the presence or absence of MPA on either side.

   The chargen [RFC0864] server ignores all data sent to it and sends a
   continuous stream of data.  The format of the data is unspecified,
   although printable, ASCII text is recommended.  In any case, a non-
   MPA-enabled chargen client receiving MPA frames will simply treat
   them as data.  An MPA-enabled chargen client receiving non-MPA
   frames will detect a bad initial marker or a CRC error in the first
   "frame".

   The quote [RFC0865] server ignores all data sent to it and sends a
   short octet string before closing the connection.  The format of the
   data is unspecified, although printable, ASCII text is recommended.
   In any case, a non-MPA-enabled quote client receiving MPA frames
   will simply treat them as data.  An MPA-enabled quote client
   receiving non-MPA frames will almost certainly detect a CRC error in
   the first "frame".  In the unlikely event that the quote of the day
   is a properly formed MPA frame, an MPA-enabled client will treat the
   "payload" of the bogus frame as data.

   The daytime [RFC0867] server ignores all data sent to it and sends a
   text timestamp.  A non-MPA-enabled daytime client receiving MPA


Elzur, et al.            Expires - August 2003                [Page 23]


                       Analysis of MPA over TCP           February 2003


   frames will simply treat them as data.  An MPA-enabled daytime
   client receiving non-MPA frames will detect a CRC error in the first
   "frame".

   The time [RFC0868] server ignores all data sent to it and sends a
   32-bit binary timestamp representing seconds since 00:00 GMT 01
   January, 1900.  The timestamp will wrap sometime in 2036.  An MPA-
   enabled time client receiving non-MPA frames will attempt interpret
   the time as an initial marker.  If the timestamp is 0, it will be
   considered valid, but the server will close the connection before
   sending an MPA header.  If the timestamp is not 0, it will be
   rejected.  A non MPA-enabled time client receiving MPA frames will
   interpret the initial marker as a 0 timestamp and display the time
   as 00:00 GMT 01 January, 1900.

8.2.2  ULPs Using Only Text Messages

   Many ULPs format all TCP payload as lines of printable, ASCII text.
   Such ULPs may all be analyzed together.

   If a non-MPA-enabled endpoint A somehow becomes connected to an MPA-
   enabled endpoint B, the precise sequence of events depends on
   whether A or B is the first to send.  If A sends first, B expects
   the first four octets to be an initial marker, which must be all
   zeros.  Since A sends printable, ASCII text, and 0h is not printable
   ASCII, then B detects a protocol violation and should terminate the
   connection.  If B sends first, A expects printable, ASCII text, but
   receives four octets of zeros, so A detects a protocol violation and
   should terminate the connection.

   ULPs of this type are:

      NNTP       [RFC0977]

      finger     [RFC1288]

      gopher     [RFC1436] the first two octets are always CR LF

      POP3       [RFC1939]

      IMAP4      [RFC2060]

      IRC client [RFC2812]

      IRC server [RFC2813]

      BEEP       [RFC3081]

      SIP        [RFC3261]


Elzur, et al.            Expires - August 2003                [Page 24]


                       Analysis of MPA over TCP           February 2003


      whois      [RFC0954]

      TACACS     [RFC1492]

      ident      [RFC1413]

      rwhois     [RFC2167]

      ACAP       [RFC2244]

      TIP        [RFC2371]

8.2.3  ULPs with Fixed Initial Message

   Several ULPs begin every connection with a fixed octet-string.
   These ULPs are safe with respect to MPA enablement errors if the
   first four octets of the initial string cannot be mistaken for valid
   initial marker (four zero octets).  These ULPs are:

      HTTP/1.0 [RFC1945] Initial string begins with "HTTP"

      RTSP     [RFC2326] Initial string begins with "RTSP"

      HTTP/1.1 [RFC2616] Initial string begins with "HTTP"

      SMTP     [RFC2821] Initial string begins with "HELO" or "EHLO"

      CIFS     [CIFS]    Initial string begins with "\xffSMB"

      gnutella [GNUT]    Initial string begins with "GNUTELLA CONNECT"

8.2.4  Protocols with framed command headers

   Several protocols always exchange command header.  This analysis
   looks at the impact of the first four octets of an MPA datastream
   being interpreted as a command header and vice versa.

8.2.4.1  iSCSI

   The first messages on any iSCSI [iSCSI] connection are a login
   exchange.  Since the first octet of a login request or response is a
   login command code (value 3), iSCSI is safe with respect to MPA
   enablement errors.

8.2.4.2  iSNS

   The first two octets of every iSNS [iSNS] transmission are the
   protocol version, which is currently 1.  Consequently iSNS is safe
   with respect to MPA enablement errors.


Elzur, et al.            Expires - August 2003                [Page 25]


                       Analysis of MPA over TCP           February 2003


8.2.4.3  DNS

   In many cases, DNS [RFC1035] uses UDP, rendering MPA irrelevant.
   This section analyzes the implications of MPA enablement errors in
   the case where DNS runs over TCP.

   Every DNS message sent over TCP is preceded by a two-octet length
   field.  Consequently, the initial marker of an MPA datastream would
   be interpreted by a non-MPA-enabled receiver as a length field of 0.
   The precise consequences of the error depend on whether the receiver
   is the server or the resolver (i.e. client).

   If a non-MPA-enabled resolver connects to an (improperly) MPA-
   enabled DNS server, the server will interpret the length field of
   the first request as an invalid initial marker and terminate the
   connection.  This configuration is safe with respect to MPA
   enablement errors.

   If an (improperly) MPA-enabled client connects to a non-MPA-enabled
   DNS server, the server will interpret the upper 16 bits of the
   initial marker as a length field of 0, indicating a null message.
   The server might be expected to respond to a null message with a
   format error, which is a safe outcome.  However, [RFC1035] does not
   mandate this behavior.  The receiver could instead simply discard
   the "null message" and continue.  This latter case requires more
   analysis.

























Elzur, et al.            Expires - August 2003                [Page 26]


                       Analysis of MPA over TCP           February 2003


          intended by resolver              interpreted by server

    0                   1              0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-------------------------------+  +-------------------------------+
   |marker[0..15] = 0x0000         |  |length = 0x0000 (null message) |
   +-------------------------------+  +-------------------------------+
   |marker[16..31] = 0x0000        |  |length = 0x0000 (null message) |
   +-------------------------------+  +-------------------------------+
   |MULPDU length                  |  |message length                 |
   +-------------------------------+  +-------------------------------+
   |ID                             |  |ID                             |
   +-+-------+-+-+-+-+-----+-------+  +-+-------+-+-+-+-+-----+-------+
   |Q| Opcode|A|T|R|R| 0   | RCODE |  |Q| Opcode|A|T|R|R| 0   | RCODE |
   |R|       |A|C|D|A|     |       |  |R|       |A|C|D|A|     |       |
   +-+-------+-+-+-+-+-----+-------+  +-+-------+-+-+-+-+-----+-------+
   |QDCOUNT                        |  |QDCOUNT                        |
   +-------------------------------+  +-------------------------------+
   |ANCOUNT                        |  |ANCOUNT                        |
   +-------------------------------+  +-------------------------------+
   |NSCOUNT                        |  |NSCOUNT                        |
   +-------------------------------+  +-------------------------------+
   |ARCOUNT                        |  |ARCOUNT                        |
   +-------------------------------+  +-------------------------------+
   |data                           |  |data                           |
   |    ...                        |  |    ...                        |
   +-------------------------------+  +-------------------------------+
   |pad //                         |  |second message                 |
   +-------------------------------+  |                               |
   |CRC[0-15]                      |  |                               |
   +-------------------------------+  |                               |
   |CRC[16-31]                     |  |                               |
   +-------------------------------+  +-------------------------------+

          Figure 6: Effect of Improperly MPA-Enabled DNS Resolver



   Figure 6: Effect of Improperly MPA-Enabled DNS Resolver illustrates
   the correspondence between protocol fields as intended by the sender
   and interpreted by the receiver.  The initial marker would be
   interpreted as a pair of null messages.  The server would then
   interpret the MULPDU length field as a DNS request length.  As it
   happens, the length field is correct, and the subsequent payload
   fields line up properly, so the server will correctly interpret the
   query and respond to it.  The response will have a non-zero length
   field, which MPA at the resolver will interpret as the upper half of
   an erroneous initial marker, terminating the connection.



Elzur, et al.            Expires - August 2003                [Page 27]


                       Analysis of MPA over TCP           February 2003


   Meanwhile the server will continue processing the incoming
   datastream, interpreting the first 16 bits of the concatenated pad
   and CRC as the length field of a second query.  The second "query"
   will be null, incomplete or too short to be valid.  If the "query"
   is null, the server will interpret the next 16 bits as a length, and
   so forth.  Eventually it will either safely run out of data or
   detect a "query" that is incomplete or too short.  If the "query" is
   incomplete, the server will wait for additional payload, doing
   nothing until the resolver disconnects; if it is too short, the
   server will respond with a format error.  In either case, no harm
   results.

   The interaction between MPA and DNS is complex, but as the analysis
   shows, even in the worst case DNS is safe with respect to MPA
   enablement errors.

8.2.4.4  LPR

   The first octet of every request made to the LPR [RFC1179] daemon is
   a printable ASCII character or a command code from the set
   {1,2,3,4,5}.  Consequently LPR is safe with respect to MPA
   enablement errors.

8.2.4.5  Kerberos

   The first octet of every Kerberos [RFC1510] request is the version
   number (currently 5).  Consequently Kerberos is safe with respect to
   MPA enablement errors.

8.2.4.6  BGP-4

   The first PDU in the BGP-4 protocol [RFC1771] is an OPEN with a
   marker (covering the first four octets) of all one bits.
   Consequently BGP-4 is safe with respect to MPA enablement errors.

8.2.4.7  LDAP v2 and LDAP v3

   All LDAP [RFC1777, RFC2251] messages are encapsulated in an
   LDAPMessage, which is defined as a SEQUENCE under ASN.1 Basic
   Encoding Rules.  Since the leading octet of a SEQUENCE is an ASN.1
   BER type code of 0x30, no LDAP datastream can begin with 4 zero
   octets.  Consequently LDAP is safe with respect to MPA enablement
   errors.

8.2.4.8  RTP

   The most significant two bits of the first octet of an RTP [RFC1889]
   payload contain the protocol version number (currently 2).
   Consequently RTP is safe with respect to MPA enablement errors.


Elzur, et al.            Expires - August 2003                [Page 28]


                       Analysis of MPA over TCP           February 2003


8.2.4.9  Socks

   The first octet of a SOCKS [RFC1928] datastream contain the protocol
   version number (currently 5).  Consequently SOCKS is safe with
   respect to MPA enablement errors.

8.2.4.10 TLS

   The first octet of a TLS [RFC2246] datastream is a command code from
   the set {20,21,22,23,255}.  Consequently TLS is safe with respect to
   MPA enablement errors.  This implies further than any ULP using TLS
   is safe with respect to MPA enablement errors.

8.2.4.11 SLP V2

   The first octet of an SLPv2 [RFC2608] datastream is the protocol
   version number (currently 2).  Consequently SLPv2 is safe with
   respect to MPA enablement errors.

































Elzur, et al.            Expires - August 2003                [Page 29]


                       Analysis of MPA over TCP           February 2003


9  Security Considerations

   This document does not define protocols; hence it does not create
   any new security considerations.















































Elzur, et al.            Expires - August 2003                [Page 30]


                       Analysis of MPA over TCP           February 2003


10 IANA Considerations

   This Internet Draft does not define any new protocols, thus there
   are no IANA considerations.















































Elzur, et al.            Expires - August 2003                [Page 31]


                       Analysis of MPA over TCP           February 2003


11 References

11.1 Primary References

   [RFC0793] J. Postel, "Transmission Control Protocol", RFC 793,
       September 1981.

   [RFC0854] J. Postel & J.K. Reynolds, "Telnet Protocol
       Specification", RFC 854, May 1983.

   [RFC1122] R. Braden, Ed., "Requirements for Internet Hosts -
       Communication Layers", RFC 1122, October 1989.

   [RFC1191] J.C. Mogul & S.E. Deering, "Path MTU discovery", RFC 1191,
       November 1990.

   [RFC2026] S. Bradner, "The Internet Standards Process -- Revision
       3", BCP 9, RFC 2026, October 1996.

   [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate
       Requirement Levels", BCP 14, RFC 2119, March 1997.

   [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP
       Specification", draft-cully-iwarp-mpa-01.txt (work in progress),
       October 2002

   [RDMAP] R. Recio et al., "RDMA Protocol Specification", draft-recio-
       iwarp-01.txt (work in progress), October 2002

   [DDP] H. Shah et al., "Direct Data Placement over Reliable
       Transports", draft-shah-iwarp-ddp-01.txt (work in progress),
       October 2002

11.2 "Little" TCP Services

   [RFC0862] J. Postel, "Echo Protocol", RFC 862, May 1983.

   [RFC0863] J. Postel, "Discard Protocol", RFC 863, May 1983.

   [RFC0864] J. Postel, "Character Generator Protocol", RFC 864, May
       1983.

   [RFC0865] J. Postel, "Quote of the Day Protocol", RFC 865, May 1983.

   [RFC0867] J. Postel, "Daytime Protocol", RFC 867, May 1983.

   [RFC0868] J. Postel & K. Harrenstien, "Time Protocol", RFC 868, May
       1983.



Elzur, et al.            Expires - August 2003                [Page 32]


                       Analysis of MPA over TCP           February 2003


11.3 ULPs using only Text Messages

   [RFC0977] B. Kantor & P. Lapsley, "Network News Transfer Protocol",
       RFC 977, February 1986.

   [RFC1288] D. Zimmerman, "The Finger User Information Protocol", RFC
       1288, December 1991.

   [RFC1436] F. Anklesaria, et al., "The Internet Gopher Protocol (a
       distributed document search and retrieval protocol)", RFC 1436,
       March 1993.

   [RFC1939] J. Myers & M. Rose, "Post Office Protocol - Version 3",
       RFC 1939, May 1996.

   [RFC2060] M. Crispin, "Internet Message Access Protocol - Version
       4rev1", RFC 2060, December 1996.

   [RFC2812] C. Kalt, "Internet Relay Chat: Client Protocol", RFC 2812,
       April 2000.

   [RFC2813] C. Kalt, "Internet Relay Chat: Server Protocol", RFC 2813,
       April 2000.

   [RFC3081] M. Rose, "Mapping the BEEP Core onto TCP", RFC 3081, March
       2001.

   [RFC3261] J. Rosenberg, et al., "SIP: Session Initiation Protocol",
       RFC 3261, June 2002.

   [RFC0954] K. Harrenstien, et al., "NICNAME/WHOIS", RFC 954, October
       1985.

   [RFC1492] C. Finseth, "An Access Control Protocol, Sometimes Called
       TACACS", RFC 1492, July 1993.

   [RFC1413] M. St. Johns, "Identification Protocol", RFC 1413,
       February 1993.

   [RFC2167] S. Williamson, et al., "Referral Whois (RWhois) Protocol
       V1.5", RFC 2167, June 1997.

   [RFC2244] C. Newman & J. G. Myers, "ACAP -- Application
       Configuration Access Protocol", RFC 2244, November 1997.

   [RFC2371] J. Lyon, et al., "Transaction Internet Protocol Version
       3.0", RFC 2371, July 1998.




Elzur, et al.            Expires - August 2003                [Page 33]


                       Analysis of MPA over TCP           February 2003


11.4 ULPs with Fixed Initial Message

   [RFC1945] T. Berners-Lee, et al., "Hypertext Transfer Protocol --
       HTTP/1.0", RFC 1945, May 1996.

   [RFC2326] H. Schulzrinne, et al., "Real Time Streaming Protocol
       (RTSP)", RFC 2326, April 1998.

   [RFC2616] R. Fielding, et al., "Hypertext Transfer Protocol --
       HTTP/1.1", RFC 2616, June 1999.

   [RFC2821] J. Klensin, ed., "Simple Mail Transfer Protocol", RFC
       2821, April 2001.

   [CIFS] Storage Networking Industry Association, "Common Internet
       File System (CIFS) Technical Reference",
       http://www.snia.org/tech_activities/CIFS/CIFS-TR-1p00_FINAL.pdf,
       March 2002.

   [GNUT] Anonymous, "The Gnutella Protocol Specification v0.4",
       http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf.

11.5 ULPs with Framed Command Headers

   [iSCSI] J. Satran et al., "iSCSI", draft-ietf-iscsi-19.txt (work in
       progress), January 2003

   [iSNS] J. Tseng et al., "Internet Storage Name Service (iSNS)",
       draft-ietf-ips-isns-16.txt (work in progress), January 2003

   [RFC1035] P.V. Mockapetris, "Domain names - implementation and
       specification", RFC 1035, November 1987.

   [RFC1179] L. McLaughlin, "Line printer daemon protocol", RFC 1179,
       August 1990.

   [RFC1510] J. Kohl & C. Neuman, "The Kerberos Network Authentication
       Service (V5)", RFC 1510, September 1993.

   [RFC1771] Y. Rekhter & T. Li, "A Border Gateway Protocol 4 (BGP-4)",
       RFC 1771, March 1995

   [RFC1777] W. Yeong, et al., "Lightweight Directory Access Protocol",
       RFC 1777, March 1995.

   [RFC2251] M. Wahl, et al., "Lightweight Directory Access Protocol
       (v3)", RFC 2251, December 1997.




Elzur, et al.            Expires - August 2003                [Page 34]


                       Analysis of MPA over TCP           February 2003


   [RFC1889] H. Schulzrinne, et al., "RTP: A Transport Protocol for
       Real-Time Applications", RFC 1889, January 1996.

   [RFC1928] M. Leech, et al., "SOCKS Protocol Version 5", RFC 1928,.
       March 1996.

   [RFC2246] T. Dierks & C. Allen, "The TLS Protocol Version 1.0", RFC
       2246, January 1999.

   [RFC2608] E. Guttman, et al., "Service Location Protocol, Version
       2", RFC 2608, June 1999.








































Elzur, et al.            Expires - August 2003                [Page 35]


                       Analysis of MPA over TCP           February 2003


12 Author's Addresses

   Uri Elzur
   Broadcom Corporation
   16215 Alton Parkway
   Irvine, CA 92619-7013 USA
   Phone: +1 (949) 585-6432
   Email: Uri@Broadcom.com

   James Pinkerton
   Microsoft Corporation
   One Microsoft Way
   Redmond, WA 98052 USA
   Phone: +1 (425) 705-5442
   Email: jpink@microsoft.com

   Robert Teisberg
   Hewlett-Packard Company
   14231 Tandem Blvd.
   Austin, TX 78728
   Phone: +1 (512) 432-8119
   Email: Robert.Teisberg@hp.com

   Dwight Barron
   Hewlett-Packard Company
   20555 SH 249
   Houston, TX 77070-2698  USA
   Phone: +1 (281) 514-2769
   Email: Dwight.Barron@Hp.com

   John Carrier
   Adaptec, Inc.
   691 S. Milpitas Blvd.
   Milpitas, CA 95035 USA
   Phone: +1 (360) 378-8526
   Email: john_carrier@adaptec.com

   Paul R. Culley
   Hewlett-Packard Company
   20555 SH 249
   Houston, TX 77070-2698  USA
   Phone: +1 (281) 514-5543
   Email: paul.culley@hp.com








Elzur, et al.            Expires - August 2003                [Page 36]


                       Analysis of MPA over TCP           February 2003


13 Acknowledgments

   Vadim Makhervaks
   IBM Corp., Haifa Development Lab
   Haifa, Israel
   Phone: +972-4-829-6537
   Email: VADIK@il.ibm.com

   Renato Recio
   IBM Corp.
   11501 Burnett Road
   Austin, Tx. USA 78758
   Phone: 512-838-3685
   Email: recio@us.ibm.com

   Tom Talpey
   Network Appliance
   375 Totten Pond Road
   Waltham, MA 02451 USA
   Phone: +1 (781) 768-5329
   EMail: thomas.talpey@netapp.com

   Patricia Thaler
   Agilent Technologies, Inc.
   1101 Creekside Ridge Drive, #100
   M/S-RG10
   Roseville, CA 95678
   Phone: +1-916-788-5662
   Email: pat_thaler@agilent.com






















Elzur, et al.            Expires - August 2003                [Page 37]


                       Analysis of MPA over TCP           February 2003


14 Full Copyright Statement

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph
   are included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

   Funding for the RFC Editor function is currently provided by the
   Internet Society.





















Elzur, et al.            Expires - August 2003                [Page 38]