Network Working Group                                         J. Heffner
Internet-Draft                                                 M. Mathis
Expires: July 29, 2007                                       B. Chandler
                                                                     PSC
                                                        January 25, 2007


               IPv4 Reassembly Errors at High Data Rates
                     draft-heffner-frag-harmful-04

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on July 29, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   IPv4 fragmentation is not sufficiently robust for use under some
   conditions in today's Internet.  At high data rates, the 16-bit IP
   identification field is not large enough to prevent frequent
   incorrectly assembled IP fragments, and the TCP and UDP checksums are
   insufficient to prevent the resulting corrupted datagrams from being
   delivered to higher protocol layers.  This note describes some easily
   reproduced experiments demonstrating the problem, and discusses some



Heffner, et al.           Expires July 29, 2007                 [Page 1]


Internet-Draft              Reassembly Errors               January 2007


   of the operational implications of these observations.


1.  Introduction

   The IPv4 header was designed at a time when data rates were several
   orders of magnitude lower than those achievable today.  This document
   describes a consequent scale-related failure in the IP identification
   (ID) field, where fragments may be incorrectly assembled at a rate
   high enough likely to invalidate assumptions about data integrity
   failure rates.

   That IP fragmentation results in inefficient use of the network has
   been well documented [Kent87].  This note presents a different kind
   of problem, which can result not only in significant performance
   degradation, but also frequent data corruption.  This is especially
   pertinent due to the recent proliferation of UDP bulk transport tools
   that sometimes fragment every datagram.

   Additionally, there is some network equipment that ignores the Don't
   Fragment (DF) bit in the IP header to work around MTU discovery
   problems [RFC2923].  This equipment indirectly exposes properly
   implemented protocols and applications to corrupt data.


2.  Wrapping the IP ID Field

   The Internet Protocol standard specifies:

      "The choice of the Identifier for a datagram is based on the need
      to provide a way to uniquely identify the fragments of a
      particular datagram.  The protocol module assembling fragments
      judges fragments to belong to the same datagram if they have the
      same source, destination, protocol, and Identifier.  Thus, the
      sender must choose the Identifier to be unique for this source,
      destination pair and protocol for the time the datagram (or any
      fragment of it) could be alive in the Internet."  [RFC0791]

   Strict conformance to this standard limits transmissions in one
   direction between any address pair to no more than 65536 packets per
   protocol (e.g.  TCP, UDP or ICMP) per maximum packet lifetime.

   Clearly not all hosts follow this standard, because it implies an
   unreasonably low maximum data rate.  For example, a host sending 1500
   byte packets with a 30 second maximum packet lifetime could send at
   only about 26 Mbits/s before exceeding 65535 packets per packet
   lifetime.  Or, filling a 1 Gbit/s interface with 1500 byte packets
   requires sending 65536 packets in less than 1 second, an unreasonably



Heffner, et al.           Expires July 29, 2007                 [Page 2]


Internet-Draft              Reassembly Errors               January 2007


   short maximum packet lifetime, being less than the round-trip time on
   some paths.  This requirement is widely ignored.

   Additionally, it is worth noting that re-using values in the IP ID
   field once per 65536 datagrams is the best case.  Some
   implementations randomize the IP ID to prevent leaking information
   out of the kernel [Bellovin02], which causes re-use of the IP ID
   field to occur probabilistically at all sending rates.

   IP receivers store fragments in a reassembly buffer until all
   fragments in a datagram arrive, or until the reassembly timeout
   expires (15 seconds is suggested in [RFC0791]).  Fragments in a
   datagram are associated with each other by their protocol number, the
   value in their ID field, and by the source, destination address pair.
   If a sender wraps the ID field in less than the reassembly timeout,
   it becomes possible for fragments from different datagrams to be
   incorrectly spliced together ("mis-associated"), and delivered to the
   upper layer protocol.

   A case of particular concern is when mis-association is self-
   propagating.  This occurs, for example, when there is reliable
   ordering of packets and the first fragment of a datagram is lost in
   the network.  The rest of the fragments are stored in the fragment
   reassembly buffer, and when the sender wraps the ID field, the first
   fragment of the new datagram will be mis-associated with the rest of
   the old datagram.  The new datagram will be now be incomplete (since
   it is missing its first fragment), so the rest of it will be saved in
   the fragment reassembly buffer, forming a cycle that repeats every
   65536 datagrams.  It is possible to have a number of simultaneous
   cycles, bounded by the size of the fragment reassembly buffer.


3.  Harmful Effects of Mis-Associated Fragments

   When the mis-associated fragments are delivered, transport-layer
   checksumming should detect these datagrams as incorrect and discard
   them.  When the datagrams are discarded, it could pose a problem for
   loss-feedback congestion control algorithms since there will be a
   high number of non-congestion-related losses.

   However, transport checksums may not be designed to handle such high
   error rates, either.  The TCP/UDP checksum is only 16 bits in length.
   If these checksums follow a uniform random distribution, we expect
   mis-associated datagrams to be accepted by the checksum at a rate of
   one per 65536.  With only one mis-association cycle, we expect
   corrupt data delivered to the application layer once per 2^32
   datagrams.  This number can be significantly higher with multiple
   cycles.



Heffner, et al.           Expires July 29, 2007                 [Page 3]


Internet-Draft              Reassembly Errors               January 2007


   With non-random data, the TCP/UDP checksum may be even weaker still.
   It is possible to construct datasets where mis-associated fragments
   will always have the same checksum.  Such a case may be considered
   unlikely, but is worth considering.  "Real" data may be more likely
   than random data to cause checksum hot spots and increase the
   probability of false checksum match [Stone98].  Also, some
   applications or higher-level protocols may turn off checksumming to
   increase speed, though this practice has been found to be dangerous
   for other reasons when data reliability is important [Stone00].


4.  Experimental Observations

   To test the practical impact of fragmentation on UDP, we ran a series
   of experiments using a UDP bulk data transport protocol that was
   designed to be used as an alternative to TCP for transporting large
   data sets over specialized networks.  The tool, Reliable Blast UDP
   (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected
   because it has a clean interface which facilitated automated
   experiments.  The decision to use RBUDP had little to do with the
   details of the transport protocol itself.  Any UDP transport protocol
   that does not have additional means to detect corruption, and that
   could be configured to use IP fragmentation, would have the same
   results.

   In order to diagnose corruption on files transferred with the UDP
   bulk transfer tool, we used a file format that included embedded
   sequence numbers and MD5 checksums in each fragment of each datagram.
   Thus it was possible to distinguish random corruption from that
   caused by mis-associated fragments.  We used two different types of
   files.  One was constructed so that all the UDP checksums were
   constant -- we will call this the "constant" dataset.  The other was
   constructed so that UDP checksums were uniformly random -- the
   "random" dataset.  All tests were done using 400 MB files, sent in
   1524-byte datagrams so that they were fragmented on standard Fast
   Ethernet with a 1500-byte MTU.

   The UDP bulk file transport tool was used to send the datasets
   between a pair of hosts at slightly less than the available data rate
   (100 Mbps).  Near the beginning of each flow, a brief secondary flow
   was started to induce packet loss in the primary flow.  Throughout
   the life of the primary flow, we typically observed mis-association
   rates on the order of a few hundredths of a percent.

   Tests run with the "constant" dataset resulted in corruption on all
   mis-associated fragments, that is, corruption on the order of a few
   hundredths of a percent.  In sending approximately 10 TB of "random"
   datasets, we observed 8847668 UDP checksum errors and 121 corruptions



Heffner, et al.           Expires July 29, 2007                 [Page 4]


Internet-Draft              Reassembly Errors               January 2007


   of the data due to mis-associated fragments.


5.  Implications

   Most TCP implementations today participate in MTU discovery
   [RFC1191], which will avoid the problems described in this note by
   avoiding IP fragmentation altogether.  However, as a work-around for
   MTU discovery problems [RFC2923], some TCP implementations and
   communications gear provide mechanisms to disable path MTU discovery
   by clearing or ignoring the DF bit.  Doing so will expose all
   protocols using IPv4, even those that participate in MTU discovery,
   to mis-association errors.

   A case particularly worth noting is that of tunnels encapsulating
   payload in IPv4.  To deal with difficulties in MTU Discovery
   [RFC4459], tunnels may rely on fragmentation between the two
   endpoints, even if the payload is marked with a DF bit [RFC4301].  In
   such a mode, the two tunnel endpoints behave as IP end hosts, with
   all tunneled traffic having the same protocol type.  Thus, the
   aggregate rate of tunneled packets may not exceed 65536 per maximum
   packet lifetime, or tunneled data becomes exposed to possible mis-
   association.  Even protocols doing MTU discovery such as TCP will be
   affected.

   IPv6 is less vulnerable to this type of problem, since its fragment
   header contains a 32-bit identification field [RFC2460].  Mis-
   association will only be a problem at packet rates 65536 times higher
   than for IPv4.

   Since mis-association of fragments will only occur when the IP ID
   field is wrapped within the fragment reassembly timeout, it may be
   possible to reduce the timeout sufficiently so that mis-association
   will not occur.  However, there are a number of difficulties with
   such an approach.  Since the sender controls the rate of packets sent
   and selection of IP ID, while the receiver controls the reassembly
   timeout, there would need to be some mutual assurance between each
   party as to participation in the scheme.  Further, it is not
   generally possible to set the timeout low enough so that a fast
   sender's fragments will not be mis-associated, yet high enough so
   that a slow sender's fragments will not be unconditionally discarded
   before it is possible to reassemble them.  So the timeout and IP ID
   selection would need to be done on a per peer basis.  Also, it is
   likely NAT will break any per peer tables keyed by IP address.  It is
   not within the scope of this document to recommend solutions to these
   problems.

   Another means of solving the corruption issue is to add stronger



Heffner, et al.           Expires July 29, 2007                 [Page 5]


Internet-Draft              Reassembly Errors               January 2007


   integrity checking, which can be done at any layer above IP.  This is
   a natural side effect of using cryptographic authentication.  If
   IPsec AH [RFC2402] is in use, the mis-associated fragments will be
   discarded at the network layer with extremely high probability.  Some
   higher layers may use longer checksums (for example, SCTP's is 32
   bits in length [RFC2960]) or cryptographic authentication (SSH
   message authentication codes [RFC4251]).  While stronger integrity
   checking may prevent data corruption, it will not solve the problem
   of a high effective loss rate.  In the case of SSH, any stream
   corruption results in immediate termination of the connection.

   It is difficult to concisely describe all possible situations under
   which fragments might be mis-associated.  Even if an end host
   carefully follows the specification, ensuring unique IP IDs, the
   presence of NATs or tunnels may expose applications to IP ID space
   conflicts.  A fragmenting application that sends at a low rate might
   possibly be exposed when running simultaneously with a non-
   fragmenting application that sends at a high rate.  There are some
   possible work-arounds that receivers might implement to reduce the
   possibility of conflict, but there is no mechanism in place for a
   sender to know what the receiver is doing in this respect.  As a
   consequence, there is no general mechanism for an application that is
   using IPv4 fragmentation to know if it is deterministically or
   statistically protected from mis-associated fragments.

   In general, applications that rely on IPv4 fragmentation should be
   written with these issues in mind, as well as those issues documented
   in [Kent87].  Applications that rely on IPv4 fragmentation while
   sending at high speeds, and devices that deliberately introduce
   fragmentation to otherwise unfragmented traffic (e.g., tunnels)
   should be particularly cautious, and introduce strong mechanisms to
   ensure data integrity.


6.  Security Considerations

   If a malicious entity knows that a pair of hosts are communicating
   using a fragmented stream, it may present an opportunity for this
   entity to corrupt the flow.  By sending "high" fragments (those with
   offset greater than zero) with a forged source address, the attacker
   can deliberately cause corruption as described above.  Exploiting
   this vulnerability requires only knowledge of the source and
   destination addresses of the flow, its protocol number, and fragment
   boundaries.  It does not require knowledge of port or sequence
   numbers.

   If the attacker has visibility of packets on the path, the attack
   profile is similar to injecting full segments.  Using this attack



Heffner, et al.           Expires July 29, 2007                 [Page 6]


Internet-Draft              Reassembly Errors               January 2007


   makes blind disruptions easier, and might possibly be used to cause
   degradation of service.  We believe only streams using IPv4
   fragmentation are likely vulnerable.  Because of the nature of the
   problems outlined in this draft, the use of IPv4 fragmentation for
   critical applications may not be advisable regardless of security
   concerns.


7.  IANA Considerations

   None.


8.  Informative References

   [Kent87]   Kent, C. and J. Mogul, "Fragmentation considered harmful",
              Proc. SIGCOMM '87 vol. 17, No. 5, October 1987.

   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
              RFC 2923, September 2000.

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [Stone98]  Stone, J., Greenwald, M., Partridge, C., and J. Hughes,
              "Performance of Checksums and CRC's over Real Data", IEEE/
              ACM Transactions on Networking vol. 6, No. 5,
              October 1998.

   [Stone00]  Stone, J. and C. Partridge, "When The CRC and TCP Checksum
              Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4,
              October 2000.

   [QUANTA]   He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N.,
              Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for
              high performance data delivery over photonic networks",
              Future Generation Computer Systems Vol. 19, No. 6,
              August 2003.

   [Bellovin02]
              Bellovin, S., "A Technique for Counting NATted Hosts",
              November 2002.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.



Heffner, et al.           Expires July 29, 2007                 [Page 7]


Internet-Draft              Reassembly Errors               January 2007


   [RFC2960]  Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
              Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
              Zhang, L., and V. Paxson, "Stream Control Transmission
              Protocol", RFC 2960, October 2000.

   [RFC2402]  Kent, S. and R. Atkinson, "IP Authentication Header",
              RFC 2402, November 1998.

   [RFC4251]  Ylonen, T. and C. Lonvick, "The Secure Shell (SSH)
              Protocol Architecture", RFC 4251, January 2006.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
              Network Tunneling", RFC 4459, April 2006.


Appendix A.  Acknowledgements

   This work was supported by the National Science Foundation under
   Grant No. 0083285.


Authors' Addresses

   John W. Heffner
   Pittsburgh Supercomputing Center
   4400 Fifth Avenue
   Pittsburgh, PA  15213
   US

   Phone: 412-268-2329
   Email: jheffner@psc.edu


   Matt Mathis
   Pittsburgh Supercomputing Center
   4400 Fifth Avenue
   Pittsburgh, PA  15213
   US

   Phone: 412-268-3319
   Email: mathis@psc.edu







Heffner, et al.           Expires July 29, 2007                 [Page 8]


Internet-Draft              Reassembly Errors               January 2007


   Ben Chandler
   Pittsburgh Supercomputing Center
   4400 Fifth Avenue
   Pittsburgh, PA  15213
   US

   Phone: 412-268-9783
   Email: bchandle@psc.edu











































Heffner, et al.           Expires July 29, 2007                 [Page 9]


Internet-Draft              Reassembly Errors               January 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Heffner, et al.           Expires July 29, 2007                [Page 10]