Internet Engineering Task Force                                A. Biswas
Internet-Draft                                              NetApp, Inc.
Intended status: Standards Track                            May 25, 2010
Expires: November 26, 2010


   Support for Stronger Error Detection Codes in TCP for Jumbo Frames
            draft-ietf-tcpm-anumita-tcp-stronger-checksum-00

Abstract

   There is a class of data serving protocols and applications that
   cannot tolerate undetected data corruption on the wire.  Data
   corruption could occur at the source in software, in the network
   interface card, out on the link, on intermediate routers or at the
   destination network interface card or node.  The Ethernet CRC and the
   16-bit checksum in the TCP/UDP headers are used to detect data
   errors.  Most applications rely on these checksums to detect data
   corruptions and do not use any checksums or CRC checks at their
   level.  Research has shown that the TCP/UDP checksums are catching a
   significant number of errors, however, the research suggests that one
   packet in 10 billion will have an error that goes undetected for
   Ethernet MTU frames (MTU of 1500).  Under certain situations, "bad"
   hosts can introduce undetected errors at a much higher frequency and
   order.  With the use of Jumbo frames on the rise, and therefore more
   data bits on the wire that could be corrupted, the current 16-bit
   TCP/UDP checksum, or the Ethernet 32-bit CRC are simply not
   sufficient for detecting errors.  This document specifies a proposal
   to use stronger checksum algorithms for TCP Jumbo Frames for IPv4 and
   IPv6 networks.  The Castagnoli CRC 32C algorithm used in iSCSI and
   SCTP is proposed as the error detection code of choice.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 26, 2010.



Biswas                  Expires November 26, 2010               [Page 1]


Internet-Draft        Stronger TCP Error Detection              May 2010


Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Conventions  . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Calculating the CRC-32C value  . . . . . . . . . . . . . . . .  4
   3.  Negotiating the use of CRC 32C . . . . . . . . . . . . . . . .  6
   4.  IPv6 Considerations  . . . . . . . . . . . . . . . . . . . . .  8
   5.  Conclusions and Acknowledgements . . . . . . . . . . . . . . .  8
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     8.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
     8.2.  Informative References . . . . . . . . . . . . . . . . . .  9
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10





















Biswas                  Expires November 26, 2010               [Page 2]


Internet-Draft        Stronger TCP Error Detection              May 2010


1.  Introduction

   There is a class of data serving applications that host business and
   financial data.  Detecting and recovering from data corruption is
   paramount to the success of this class of applications.  Data
   corruption can occur while data is transiting from the source to a
   desired destination.  Data can get corrupted right at the source due
   to software errors, within the network interface card, out on the
   wire or link, in intermediate routers and at the destination network
   interface or node.  Link errors are detected using the Ethernet 32-
   bit CRC.  Node or router errors are detected using the 16-bit
   checksum in the transport headers of TCP and UDP.  Most applications
   do not have built-in error detection capability and typically rely on
   the checksums in the underlying networking layers.  Stone et al.
   [Stone] have recommended applications employ their own checksums to
   detect errors that go undetected by lower levels.  They have made
   this recommendation for the standard Ethernet MTU.  They have done so
   considering situations where a "bad" host can introduce undetected
   errors at a much higher frequency and order.  It must also be said
   that the physical layer already does encodings with bit error
   rates(BER) of 10^-12 ti 10^-14 and therefore the current checksum
   algorithms may be sufficient.  However, stronger checksumming
   accounts for the cases where noisy hardware, bad cables can introduce
   noise at a much higher frequency and order.  It is also to be noted
   that increasing speed of the physical medium (to 40G and 100G) can
   also lead to higher BER.

   Another dynamic, very much in the rise is the use and deployment of
   Jumbo Frames.  Jumbo Frames reduce per packet overheads significantly
   and are a cheap way of improving the performance of bulk data
   applications.  Combining the use of Jumbo frames with noisy physical
   medium increases the risk of undetected bit errors as there simply
   are more bits that can get corrupted.  This is rather concerning as
   business and financial data typically are transported over the
   network using file access based protocols like NFS, CIFS, HTTP over
   TCP.

   The strength of the Ethernet CRC checksum and the 16-bit Transport
   checksum has been found to reduce for data segments that are larger
   than the standard Ethernet MTU.  Koopman et. al.  [Koopman] have
   explored a number of CRC polynomials as well as the polynomial used
   in the Ethernet CRC calculation.  They have measured the
   effectiveness of these CRC polynomials for different data word
   lengths, where a data word is a bit stream from 64 bits to 128 Kbits.
   These data word lengths cover lengths equivalent to Ethernet MTUs and
   Jumbo frames and also frame lengths larger than Jumbo frames.  They
   found that the Castagnoli polynomial x^32 + x^28 + x^27 + x^26 + x^25
   + x^23 + x^22 + x^20 + x^19 + x^18 + x^14 + x^13 + x^11 + x^10 + x^9



Biswas                  Expires November 26, 2010               [Page 3]


Internet-Draft        Stronger TCP Error Detection              May 2010


   + x^8 + x^6 + x^0 represented as the 32-bit code 0x8F6E37A0 bests
   other CRC polynomials for Jumbo frames and larger segments.  This
   polynomial has been adopted by the iSCSI and SCTP standards.  It is
   to be noted that this polynomial is represented as the 32-bit code
   0x11EDC6F41 in SCTP in accordance to the convention adopted for bit-
   ordering at the transport-level, i.e., bit-ordering for mapping SCTP
   messages to polynomials is that bytes are taken most significant
   first, but within each bytes, bits are taken least-significant first.

   Given the ubiquity of TCP, it is the layer where we can introduce
   stronger error detection capability without duplicating the effort in
   higher layers.  TCP options provide an easy path to introduce
   stronger checksum without hindering interoperability.  TCP options
   allow a TCP stack supporting a TCP option to interoperate seamlessly
   with a TCP stack that does not support the new TCP option (RFC 1122
   [RFC1122] requires the interoperability in Section 4.2.2.5).

   This document proposes that the use of the Castagnoli polynomial,
   also known as the CRC 32C as the "checksum" of choice for TCP
   protocol.  Other summation based checksum algorithms like Fletcher
   and Adler's algorithm were evaluated in RFC 3385 [RFC3385] and found
   to behave substanially worse than CRCs and hence are not considered
   in this proposal.

   By standardizing a stronger checksum at the TCP level, we can quickly
   drive the offloading of this checksum to NIC hardware, just as the
   16-bit TCP checksum is offloaded by most NIC vendors today.
   Offloading computation to hardware allows us to get rid of the in-
   software computation overheads of stronger checksum algorithms.

   Another positive effect of implementing strong TCP checksumming is
   that this will drive the rapid adoption of 9K Jumbo frames and make
   it considerably easier to consider even larger Jumbo Frames.

1.1.  Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


2.  Calculating the CRC-32C value

   The 16-bit TCP checksum does a checksum of the TCP header and
   payload.  It also includes the pseudo header values of Source
   Address, Destination Address, Protocol and TCP Length.  The addition
   of the bytes of a pseudo header into a summation based checksum
   algorithm is simpler than the inclusion of the bytes of a pseudo



Biswas                  Expires November 26, 2010               [Page 4]


Internet-Draft        Stronger TCP Error Detection              May 2010


   header into a CRC computation.  This is because a CRC computation
   assumes a contiguous bit stream when translating the bit stream to a
   polynomial for doing the polynomial division.  The psuedo header was
   added to the TCP checksum computation in order to detect errors
   introduced in one of the IP header fields that could possibly cause
   the packet to be sent to an incorrect destination.  These fields also
   get included in the IP header checksum.  The intent was to include
   them in two separate checksums for better data integrity.  One can
   question the need for including the pseudo-header fields twice.  The
   pseudo-header currently get included thrice if one considers the fact
   that the Ethernet CRC is computed over the entire Ethernet frame and
   Ethernet is ubiquitous today.  So for the purposes of this draft, all
   the fields used in the current TCP checksum except the pseudo-header
   must be included in the CRC-32c calculation.  If this draft's
   proposal is accepted for standardization, IETF may elect to add back
   the pseudo-header into the CRC-32C calculation or add only a smaller
   subset of the fields.  But it is to be noted that in this proposal we
   do have room to consider changes like this without disrupting current
   installations.

   It may also be questionable whether one needs to compute the 16-bit
   TCP checksum if the new TCP checksum option is present.  To avoid a
   chicken and egg problem, this document proposes that the 16-bit
   checksum field be zeroed out and included in the CRC 32C checksum as
   part of the TCP header bit stream.  The standardization process may
   choose a different approach and decide to do both the 16-bit TCP
   checksum and the CRC 32C checksum, in which case, a method will need
   to be defined as to the order of checksumming and the fields used in
   each of the checksum computations.

   This document also recommends the use of the CRC-32C when the
   negotiated Maximum Segment Size (MSS) value is equal or greater than
   8948 bytes (excluding frame and TCPIP header bytes), the most common
   Jumbo Frame size, but does not explicitly recommend the use of CRC-
   32C for standard Ethernet MTU frames.

   The CRC-32C MAY be used even for regular Ethernet MTU frames also if
   the application so desires for stricter data integrity checking,
   since CRC-32C can detect more independent bit errors than Ethernet
   CRC for Ethernet MTU sized packets.  The use of CRC-32C can be made
   settable by the application, by providing a socket option to the
   application.  The provision for an application to enable/disable the
   use of the new checksum option is left as an API detail of the
   particular TCP/IP socket layer implementation.

   The following section describes two possible approaches to
   negotiating the proposed 32-bit TCP checksum.  The common thread in
   the two approaches is the use of TCP options to negotiate the use of



Biswas                  Expires November 26, 2010               [Page 5]


Internet-Draft        Stronger TCP Error Detection              May 2010


   this checksum during the connection setup phase.  Once the connection
   is setup, all subsequent packets sent during the connection transfer
   phase MUST carry the stronger checksum except as described below.

   It is also possible that Path MTU discovery causes a connection to
   reduce the negotiated MSS value post connection establishment.  So,
   during connection establishment, an MSS equal or greater than 9K
   might have been negotiated along with stronger TCP checksumming, and
   then later the MSS reduced to be equal to the discovered path MTU.
   If the reduced MSS value is equal or less than an Ethernet MSS
   (typically 1460 without other TCP options), then the TCP end point
   that reduced its MTU may choose to NOT send the TCP checksum option
   in subsequent data packets.  The peer must then rely on the 16-bit
   TCP checksum for end to end data integrity which is okay since the
   Ethernet CRC has comparable data integrity checking capability for
   Ethernet sized packets.

   Now, let us discuss the method for computing the CRC 32c value:

   The CRC computation uses polynomial division.  The TCP header and
   payload is mapped to a polynomial and the CRC is calculated by
   dividing the bit stream with the CRC 32C polynomial.  Stone et. al.
   in Appendix B of RFC 4960 [RFC4960] describe a convention for mapping
   the bytes of the bit stream into the polynomial.  The same MUST be
   adopted for TCP transport too.


3.  Negotiating the use of CRC 32C

   There are two possible approaches to negotiating the proposed CRC 32C
   checksum during the TCP connection setup phase.

   o  A new TCP option

   o  Using the TCP Alternate Checksum Data Option

   The first approach introduces a new TCP option to be negotiated by
   TCP endpoints during the connection setup phase.  It will be of the
   same format as other defined TCP options and will have Type, Length
   and Value fields.  A new type will be requested from IANA.  The
   length field will be the sum total length of the new TCP checksum
   option which is 6 bytes.  The value field will hold the 32-bit CRC
   32C checksum.

   If either one of the peers does not add this option to its TCP
   options list in its SYN segment, the CRC-32C checksum must not be
   used by the other peer.  Most TCP implementations are written to
   process the TCP options they recognize and ignore unknown options on



Biswas                  Expires November 26, 2010               [Page 6]


Internet-Draft        Stronger TCP Error Detection              May 2010


   SYN segments so an endpoint that supports the new TCP option can
   interoperate with an endpoint that does not support the proposed TCP
   option.

   Since we have seen that the 16-bit TCP checksum is insufficient for
   detecting multiple independent errors for Jumbo frames, this proposal
   says that a peer supporting this option MUST send the new TCP
   checksum option if its link MTU is equal or greather than 9K.
   However, if the remote peer does not recognize the new option, the
   initiating peer MUST NOT use this TCP extension for the connection
   transfer phase.  If the remote peer recognizes the option and also
   has a Maximum Segment Size equal to the peer's advertised MSS or a
   minimum MSS of 9K, it MUST respond with the TCP checksum option.
   Every subsequent packet from both peers must include this option in
   the TCP header.  The extra overhead for adding this option is minimal
   for Jumbo frame sized segments and the higher data integrity pays for
   itself.

   Note that all TCP control packets sent after succesfully negotiating
   this TCP option may carry this TCP option also, although this draft
   does not mandate it.

   TCP CRC Checksum Option.

   +----------+------------+----------------------------+
   | Kind = X | Length = 6 | Value = 4 bytes of CRC 32C |
   |----------+------------+----------------------------+

   .

                                 Figure 1

   The second approach utilizes a pair of existing TCP options called
   the "TCP Alternate Checksum Options" specified in RFC 1146 [RFC1146].
   The current checksum types specified by that option are TCP checksum,
   8-bit Fletcher's algorithm and 16-bit Fletcher's algorithm.  A new
   checksum type can be added to this list for CRC-32C checksums.  The
   negotiation rules for selecting the checksum type would follow the
   rules described in RFC1146.  That is, if both SYN segments carry the
   Alternate Checksum Request option, and both specify the same
   algorithm, that algorithm must be used for the remainder of the
   connection.  Otherwise, the standard TCP checksum must be used for
   the entire connection.

   Once the CRC 32C checksum algorithm is negotiated, the TCP Alternate
   Checksum Data Option is sent whose data will equal 4 bytes for the
   CRC-32C checksum.




Biswas                  Expires November 26, 2010               [Page 7]


Internet-Draft        Stronger TCP Error Detection              May 2010


   TCP Alternate Checksum Request Option
   +-----------+------------+-----------------+
   | Kind = 14 | Length = 3 | Value = CRC-32C |
   |-----------+------------+-----------------+

   Here the value for CRC32C would need to be defined, and may possibly
   be the next undefined value '3', following the definitions for 8-bit
   and 16-bit fletcher's algorithms.

   TCP Alternate Checksum Data Option
   +-----------+------------+--------------------------------+
   | Kind = 15 | Length = 6 | Value = CRC-32C computed value |
   |-----------+------------+--------------------------------+

   The TCP Alternate Checksum Data Option must be sent only during the
   connection transfer and tear down phase.  Again, the 16-bit TCP
   checksum field must be zeroed out before computing the 32-bit CRC 32C
   code.

   One or more padding bytes may be used when sending any of the above
   options to align to a 4 or 8 byte boundary for faster parsing on both
   32-bit and 64-bit machines.

   At this stage of draft development, the author is evaluating and
   seeking inputs for both approaches.


4.  IPv6 Considerations

   The TCP extension for CRC 32C can be applied equally to IPv4 and
   IPv6.  The pseudo header for IPv6 includes 128 bit source and
   destination addresses.  This pseudo header, the TCP header and
   payload MUST be included in the CRC 32C checksum of a TCP/IPv6
   segment as there is no IPv6 header checksum.


5.  Conclusions and Acknowledgements

   This document proposes the use of stronger error detection codes for
   TCP connections sending Jumbo Frames.  It does not provide a solution
   for UDP based applications.  I would also like to thank Tom Kessler
   (kessler@netapp.com) for his review comments.  He specifically
   pointed out his concerns about the safety of TCP checksum + Ethernet
   CRC at 40G and 100G speeds with even 9K jumbo frames.  He also
   provided information on the Intel instruction set that can be used to
   speed up CRC-32c computation.  Special thanks to Janet Takami
   (jtakami@netapp.com) for her comments as well as for pointing out
   that there is no IPv6 header checksum and so the pseudo header must



Biswas                  Expires November 26, 2010               [Page 8]


Internet-Draft        Stronger TCP Error Detection              May 2010


   be included in the CRC 32c checksum.


6.  IANA Considerations

   This memo includes a request to IANA for a new Type Number for the
   new TCP Checksum Option if we do not go with the TCP Alternate
   Checksum Option.  If we go with the TCP Alternate Checksum option,
   then a new checksum type will need to be defined for CRC 32C,
   probably after the defined values for Fletcher's 8-bit and 16-bit
   algorithm types.


7.  Security Considerations

   The CRC 32C codes can detect unintentional changes to data such as
   those caused by noise.  If an attacker changes the data, it can also
   change the error-detection code to match the changed data.  Hence,
   these codes are not intended for security purposes.


8.  References

8.1.  Normative References

   [RFC1122]  IETF, "Requirements for Internet Hosts -- Communication
              Layers", October 1989.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

8.2.  Informative References

   [Koopman]  Koopman, P., "32-Bit Cyclic Redundancy Codes for Internet
              Applications", 2002.

   [Stone]    Stone, J., Partridge, C., "When the CRC and TCP Checksum
              Disagree"

   [RFC1146]  Zweig, J., Partridge, C., "TCP Alternate Checksum Options"
              March 1990.

   [RFC3385]  Sheinwald, D., et. al. "Internet Protocol Small Computer
              System Interface (iSCSI) Cyclic Redundance Check (CRC)/
              Checksum Considerations", September 2002.

   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
              September 2007.



Biswas                  Expires November 26, 2010               [Page 9]


Internet-Draft        Stronger TCP Error Detection              May 2010


Author's Address

   Anumita Biswas
   NetApp, Inc.
   495, E. Java Dr
   Sunnyvale, CA  95054
   USA

   Phone: +14088223204
   Email: anumita.biswas@netapp.com









































Biswas                  Expires November 26, 2010              [Page 10]