[Search] [txt|pdf|bibtex] [Tracker] [Email] [Nits]

Versions: 00                                                            
Network Working Group                                        J. Freniche
                                                                    CASA
Category: Informational                                        July 1998


                       TCP Window Probe Deadlock
                  <draft-rfced-info-freniche-00.txt>


Status of This Memo

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).


Distribution of this document is unlimited.


Copyright Notice.

   Copyright (C) The Internet Society (1998). All Rights Reserved.


Introduction.

   In the course of developing/testing a TCP/IP stack for embedded
   computers, a situation that can be called 'TCP window probe deadlock'
   and subsequent connection abort has been observed.

   The above has been detected when a client host sends, using TCP (Ref.
   1), a huge amount of data to a server host, which in turns processes
   such input and returns also a huge amount of data. If the sender does
   not mix appropriately send and receive requests to its underlaying
   TCP, it is possible to enter in a situation where both applications
   are blocked and the respective TCP layers interchange window probes
   forever (unless aborted by some sort of alarm).

   Initially it was though that the deadlock was a fault when
   implementing the TCP/IP stack. But it has been reproduced immediately
   in several other configurations (FreeBSD <-> FreeBSD, FreeBSD <-> HP-
   UX, HP-UX <-> HP-UX, FreeBSD <-> Solaris and FreeBSD <-> AIX, all
   tested using Ethernet interfaces and also when using the local
   interface). Given its nature, it is believed that it occurs in many,
   if not all, TCP implementations.

   Next section gives indications on how to reproduce the deadlock,



Freniche                      Informational                     [Page 1]


   followed by a more detailed description and analysis. Conditions and
   factors having influence in the deadlock are examined and a solution
   (at TCP and application level) is proposed, whose impact in current
   applications is analyzed.  Appendixes with traces are also included.

   For those more curious, the board was a "bare machine" Motorola
   MC68040 single board computer with an AMD79C90 LANCE interface,
   programming language was Ada, operating system was the nuke Ada Run
   Time System.


Reproducing the Deadlock.

   C and S are hosts communicating by TCP. C (client) runs a client
   program that sends lots of data to S (server) between receive
   requests. The server processes such data and returns to the client
   also lots of data. The interface between the applications and the
   underlaying TCPs is blocking (which is the default behavior for
   Sockets).

   Note that C and S do not need to be directly connected, i.e., routers
   can exist between C and S.

   A good example for such a server application is the echo service
   (Ref. 2).  Specifically the deadlock was detected by enabling the
   echo service in the server S, and then running the "tcpecho" client
   program (included in Appendix 1) in the client C. In the next
   explanations, "echo" will be used as such server, however any
   client/server pair with the same characteristics must exhibit the
   deadlock.

   In the server, enable the stream echo service (uncomment the echo
   stream service in /etc/inetd.conf and then reboot "inetd"). In the
   client, compile the "tcpecho.c" program and execute it (as a normal
   user):

   client> tcpecho -n 1 -a 120 -m 60000 server A

   where:
      -n 1      send just 1 buffer
      -a 3600   set an alarm (for socket operations) for 3600 seconds
      -m 60000  buffer size is 60000 bytes
       server   name or ip address of the server
       A        just one character placed in all bytes of the payload

   Payload (-m 60000) may need to be adjusted to provoke the deadlock,
   as it depend upon the receiver buffer sizes and other connection
   parameters in both sides.



Freniche                      Informational                     [Page 2]


   The communication between S and C is monitored using a "sniffer" (HP
   Advisor) and "tcpdump" (Ref. 3) in any machine attached to the same
   subnet as anyone of the other two hosts (assuming the media is
   Ethernet), that even can be the client or the server host.

   Really, "tcpdump" is sufficient to see the deadlock. Clearer trace is
   obtained by setting "tcpdump" strong filters for the socket pair (C,
   port number) and (S, port number). One sample of the trace with the
   deadlock is included in Appendix 2.


Description.

   Once the TCP connection is established, segments are interchanged
   between C and S. After some amount of data is send and received, the
   client continue sending segments (with data) but announcing its
   receive window is 0.

   S stops then sending data to C, but continues receiving data segments
   (with window 0) from C. After a while, S announces also to C that its
   receive window is now 0.

   An interchange of window probes is made now, one after other, by both
   hosts, spaced increasingly in time. No more data is effectively
   interchanged and processed, client and server applications do not
   progress.

   Both hosts are now in window probe deadlock.

   The deadlock is finally broken by exhausting (in hosts that implement
   a limit to retransmissions of window probes) the number of
   retransmissions of window probes (between 10 to 15 per host, that
   giving the back-off, means between 10 minutes and 1 hour) or by the
   alarm in the client side.

   The connection is aborted if such alarm was implemented, otherwise
   the deadlock continues forever.

   Appendix 2 contains a trace of a connection in "window probe
   deadlock" and subsequent abort by an alarm.


Explanation of the Deadlock.

   To make easier the explanation, numeric parameters (but
   representative of actual figures) for the connection in both hosts
   are used as in this example:




Freniche                      Informational                     [Page 3]


   Client C:
      application send buffer = 60000 bytes
      application recv buffer = 60000 bytes
      MTU is 1460 bytes
      TCP send buffer = 16384
      TCP recv buffer = same as send buffer

   Server C:
      application send buffer = 8192 bytes
      application recv buffer = 8192 bytes
      MTU is 1460 bytes
      TCP send buffer = 16383
      TCP recv buffer = same as send buffer

   The sequence in the communication is:

   1  Client C opens the connection with the server S.

   2  Client C issues a socket send with a buffer of 60000 bytes.

   3  TCP in client C copies 16384 bytes from the application send
      buffer to the TCP send buffer and starts to send TCP several
      segments of 1460 bytes. As there are bytes in the application send
      buffer pending to be accepted by TCP, the client application is
      blocked.

   4  TCP in server S receives the several segments, sends back the
      acknowledges and delivers data to the server application, in
      chunks with maximum size of 8192 bytes. The server program
      processes the data (in the case of echo, the processing is just a
      copy) and issues a socket send with 8194 bytes. TCP in the server
      accepts such data and sends it to the TCP client.

   5  The TCP client receives the data sent by the server and keeps it
      in its receiver window. The client application is still blocked.

   6  Steps 3, 4 and 5 continue. Evidently, as the client TCP continues
      sending/receiving data, the TCP receive buffer in C will be filled
      up.

   7  Therefore, C starts to announce receive window 0. As the TCP send
      buffer in C still contains data, TCP C will continue sending data
      segments to S (with receive window 0). But the client application
      is still blocked, as not all bytes in the send request were
      accepted by the TCP C.

   8  Obviously S continues receiving data segments (with window 0) from
      C. Such data is passed to the server application, which processes



Freniche                      Informational                     [Page 4]


      it and send back to the client. But now such transmission is
      blocked by the server TCP (as the TCP client announced window 0).
      Eventually the TCP send buffer in S will be filled. The next send
      call issued by the application server wil block.

   9  Finally, the TCP receive window in S will be filled up completely
      by the data segments received from S.

   10 In this moment, both applications are blocked in their socket send
      calls and both TCPs have their receive windows completely filled
      up.

   Therefore both TCPs will start to send window probes, increasingly
   spaced in time, until end of retransmission attempts (if implemented)
   or alarm expiration. The connection is then aborted in this case, or
   else will run forever in "window probe deadlock".


Conditions for the Deadlock.

   Clearly the first condition is that the client tries to send a huge
   amount of data in one or several consecutive socket send calls. Huge
   is understood here in comparison with the local TCP send buffer size.

   This last size must also be sufficient to produce a number of
   segments that start to fill up the peer TCP receive buffer size.

   To obtain the deadlock, server applications must be according with
   the "echo" pattern: they must read data from their local TCP in not
   too much large chunks (in comparison with the client's application
   send buffer size), process such data and finally respond to the
   client with large amount of data.

   The server's application send buffer must be of medium size, to allow
   that the TCP send buffer fills up it completely and the socket send
   call blocks. The server's TCP send buffer size must also allow for
   sufficient segments to fill the client TCP receive buffer.

   As noted, such conditions on TCPs send and receive buffer sizes are
   usually found in current TCP implementations. Same for send and
   receive buffer sizes of server applications. It is also no so unusual
   that client and server interchange large amount of data.

   The only non-usual condition is the client sending a huge amount of
   data, in one or several consecutives blocking send calls, before
   reading responses.





Freniche                      Informational                     [Page 5]


Other Factors.

   Given that that the root of the phenomenon is a "mechanical" blocking
   among the send/receive applications/TCPs buffers, it is evident that
   transmission media characteristics such as MTU and speed do not
   contribute, positive or negatively.

   However, TCP can be used on top of some transmission protocols (SMDS,
   ATM) that have larger MTU, and TCP send and receive windows are
   usually larger. This may mitigate or even avoid the deadlock. But on
   the other hand, such protocols are used to interchange large amount
   of data. Again, the driven condition is to send large amount of data,
   that will be processed and returned, without intermixing receive
   requests.

   By the same reason, host processing power is not a relevant factor.

   The deadlock is also independent of slow start/congestion avoidance
   as well from sender/received silly window avoidance and Nagle
   algorithms. Note that latest FreeBSDs use initially a large send
   congestion window in local networks but this last feature is not
   implemented in HP-UX. However, the deadlock was obtained in both type
   of hosts.

   The phenomenon is clearly dependent upon the send and receive buffer
   sizes of the client and server applications, as well of the send and
   receive buffer sizes of both TCPs. Modifying such parameters can
   solve the particular problem, but the possibility of deadlock and
   subsequent timeouts will be still there.

   This has been checked for several combinations of sizes. Just by
   adjusting conveniently the -m payload, the deadlock is obtained
   again.


Proposed Solution.

   Only a blocking interface between the application and the TCP level
   is considered in this section (i.e., blocking sockets).

   If the conditions of the client/server are as described in previous
   paragraphs, and blocking sockets are used, there is a potential for
   deadlock. Such situation can be avoided by using other models in the
   client/server communication, such as non-blocking sockets or even
   using two connections, one for sending and other for receiving data
   (see Ref. 4 for a detailed discussion and implementation).

   However there is a solution (at TCP code level in the client) that



Freniche                      Informational                     [Page 6]


   can impede the deadlock, even with blocking sockets.

   The application is blocked because there is not space in the TCP send
   buffer. But such buffer will continue completely filled up because
   the peer closed its receive window, and the peer application is also
   blocked in a send call by the same reason.

   One way to break the deadlock is to wake up the local application
   when:

   A  The application send buffer is completely copied to TCP send
      buffer (this is the behavior already available)


   or else when ALL the following conditions hold:


   B-1  TCP send buffer full (this TCP cannot accept more data from the
        send call),

   B-2  TCP send window is 0 (this TCP cannot send data to peer, so the
        TCP send buffer will not be emptied),

   B-3  TCP receive buffer is full (this TCP cannot accept more data
        received),

   B-4  TCP has a pending blocking send request,

   If the four conditions B hold on one side, such part of the
   connection is blocked. But the peer status is as follows: peer TCP
   receive buffer is full (by condition B2); peer TCP will not send any
   more data (by condition B3).

   There is the potential that now the peer application issues a send
   call with more data than its local TCP send buffer can accommodate,
   therefore also blocking.

   Obviously, if the application is awakened in the side where
   conditions B hold, the deadlock is avoided.

   The solution implies at much three modifications to the TCP code that
   output a segment, to wake up the application when the conditions
   hold. The check must be done at much in three places:

   Place 1  When the retransmission timer expires and the TCP must
            transmit a window probe.

   Place 2  When a peer window probe is received and an acknowledge must



Freniche                      Informational                     [Page 7]


            be transmitted.

   Place 3  When setting up the retransmission timer for window probe
            transmission.

   Places 1 and 2 are reactive solutions once the deadlock is present.
   Instead, last place is pro-active, acting immediately before the
   deadlock can occur. If the check succeeds, the application must be
   notified with the number of bytes accepted by the TCP or by a
   specific error in case of no bytes accepted.

   If the check is implemented only in Places 1 and 2 and only in the
   client TCP, the connection runs now to completion but with
   unnecessary delays: if the connection falls into the "window probe
   deadlock", the application client will be awakened after the first
   window probe timeout, which is about 3 seconds.

   If the check is implemented only in Places 1 and 2 and only in the
   server TCP, in addition to the previous unnecessary delay, the
   connection runs slowly after the first window probe deadlock is
   avoided. The reason is that the application server usually has a
   short to medium sized receive buffer (for example, 8194 for "inetd"
   built-in servers such as echo). Once forced to wake up, the server
   TCP announces a receive window of 8194, which is immediately filled
   by the client, and the deadlock occurs again until the server be
   awakened again by this modification (after 3 seconds), and so on.

   Modifying the TCP code just in Place 3 avoids such delays and it is
   clearly sufficient for all the cases.

   Testing a TCP level modified according to this section (just in Place
   3), it was seen that now connections run correctly and smoothly to
   completion in the case of being a client, independently if the TCP
   server has also implemented it.

   On the other hand, if the TCP client did not implement the solution,
   deadlock may occur even if the TCP server has implemented it. This
   asymmetry is caused by the characteristics of the client and server:
   even if the server is notified, it has no other choice than read
   (until its memory resources are exhausted) and respond, causing now
   the deadlock. On the other hand, the client may now switch between
   sending and reading, avoiding the deadlock.

   A sample trace for a modified client and unmodified server is
   included in Appendix 3. The connection now runs to completion with no
   delays.





Freniche                      Informational                     [Page 8]


RFC ????                TCP Window Probe Deadlock              July 1988


Impact on Current Applications.

   The new behavior is compatible with current applications, as the
   socket send specification (Ref. 5)

   number_of_accepted_bytes = send (socket,
                                    application_send_buffer_address,
                                    number_of_bytes_to_send,
                                    flags)

   already returned the number of bytes sent.

   If the socket is blocking, the caller will be blocked and only
   notified when the complete buffer has been accepted by the local TCP.
   In this case, it will return such number of bytes.

   The caller can also be notified when a send timeout expires (if such
   socket option was set, but not all TCP implementations provide such
   option). In this case it will return the effective number of bytes
   accepted (can be 0, in this last case the error EWOULDBLOCK is set).

   Therefore, the implications of the new behavior were already present
   in the interface provided by Sockets. Well-coded applications are
   therefore aware of such behavior, the proposed TCP modification will
   have no impact on them.

   There follows an example of code aware of this possibility:

     if ((n = send (s, *buf, strlen (buf), flags)) == strlen (buf)) {
        /* the complete buffer was sent */
        ...

     } else if (n == -1) {
        /* some error detected, check errno */
        if (errno == EWOULDBLOCK) {

           /* could not sent, proceed accordingly */
           ... code for try again ...

        } else {
           /* serious error, proceed accordingly */
           ...
        }

     } else {
        /* the buffer was sent partially */
        ... code for sending the remaining part ...
     }



Freniche                      Informational                     [Page 9]


   Note that it is the attempt of sending more data when the buffer was
   not completely sent, what can lead to the window probe deadlock
   described.

   If conditions are as described in previous sections, the deadlock may
   occur. If the TCP code is not modified, the application will remain
   blocked in a send call, until retransmissions expire (if implemented)
   or alarms expire (if used). No modification to the application will
   avoid the deadlock.

   If the TCP code is modified as proposed, a client application
   notified of a send call with the buffer not completely sent must not
   try to send again.  Instead, it must replace the code for sending the
   remaining part by code for reading the first responses to what was
   already sent. There follows a pseudo-code for this (see also Appendix
   1):

       adjust pointers to cover the whole buffer as a chunk to be sent
       loop until the whole buffer is sent
           send a chunk /* first time is all the buffer */
           check status:
                if -1 then error and exit,
             except when errno is EWOULDBLOCK, continue.
           read response from server
        adjust the pointers to the next chunk
       end loop


Security Considerations.

   The only security implication detected in this study is a "denial of
   service" attack to those hosts that do not implement a limit in the
   retransmission of window probes but that provide servers that send
   large amount of data to clients, in response to large amount of data
   sent by such clients. A representative server is the one implementing
   the echo protocol (Ref. 2).

   An attacker could then establish connections as described in this
   memo. As the server host will not abort the retransmission of window
   probes, the attacker will be able to waste resources in the server,
   as long as he maintains such connections.

   To avoid this attack, do implement a limit to the retransmission of
   window probes. The modifications proposed to the client TCP code
   level will avoid the deadlock in the client side, but no in the
   server side.





Freniche                      Informational                    [Page 10]


References.

   [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
       September 1981.

   [2] Postel, J., "Echo Protocol", STD 20, RFC 862, May 1983.

   [3] Lawrence Berkeley National Laboratory, Network Research Group
       (tcpdump@ee.lbl.gov): ftp://ftp.ee.lbl.gov/tcpdump.tar.Z

   [4] Stevens, W. R.: Network Programming, Vol. 1, 2nd Ed., 1998,
       Prentice Hall.

   [5] IEEE, "Protocol Independent Interfaces", IEEE Std 1003.1g.





































Freniche                      Informational                    [Page 11]


Author's Address.

   Juan L. Freniche
   Engineering Division
   Construcciones Aeronauticas (CASA)
   Getafe (SPAIN)

   Phone: + 34.91.624-2950
   Fax:   + 34.91.624-2705

   EMail: jlfreniche@acm.org








































Freniche                      Informational                    [Page 12]


Appendix 1: Listing of tcpecho.c

   #include <stdlib.h>
   #include <sys/socket.h>
   #include <netdb.h>

   #define SA struct sockaddr

   int s;    /* socket descriptor */

   void set_alarm (int duration) {
     struct itimerval inttimer;
     struct itimerval ointtimer;
     inttimer.it_interval.tv_sec = duration;
     inttimer.it_interval.tv_usec = 0;
     inttimer.it_value.tv_sec = duration;
     inttimer.it_value.tv_usec = 0;
     setitimer (ITIMER_REAL, &inttimer, &ointtimer);
   }

   void close_comms () {
     struct linger linger;
     linger.l_onoff = 1;
     linger.l_linger = 0;
     setsockopt (s, SOL_SOCKET, SO_LINGER, &linger, sizeof (linger));
     close (s);
   }

   void timeout () {
     fprintf (stderr, "Connection timeout0);
     close_comms ();
     exit (1);
   }

   int process_by_tcp (char *remote_host, char *msg,
                 int multiple, int times, int alarm_time)
   {
     struct hostent *hp;
     struct servent *sp;
     struct sockaddr_in peeraddr_in;

     int nbytes, i, pending_bytes;
     char echo_msg [multiple * strlen (msg)];
     char echo_constant [multiple * strlen (msg)];
     char *aux;

     memset ((char *) &peeraddr_in, 0, sizeof (struct sockaddr_in));
     peeraddr_in.sin_family = AF_INET;



Freniche                      Informational                    [Page 13]


     hp = gethostbyname (remote_host);
     if (hp == NULL) {
       fprintf (stderr, "tcpecho: %s not found0, remote_host);
       return -1;
     }
     peeraddr_in.sin_addr.s_addr =
        ((struct in_addr *) (hp->h_addr))->s_addr;

     sp = getservbyname ("echo", "tcp");
     if (sp == NULL) {
       fprintf (stderr, "tcpecho: echo not found in /etc/services0);
       return -1;
     }
     peeraddr_in.sin_port = sp->s_port;

     s = socket (AF_INET, SOCK_STREAM, 0);
     if (s == -1) {
       fprintf (stderr, "tcpecho: Unable to create socket0);
       return -1;
     }

     set_alarm (alarm_time);
     if (connect (s, (SA *) &peeraddr_in,
                  sizeof (struct sockaddr_in)) == -1) {
       set_alarm (0);
       fprintf (stderr, "tcpecho: Unable to connect to remote host %s0,
             remote_host);
       return -1;
     }
     set_alarm (0);

     echo_constant [0] = ' ';
     aux = echo_constant;
     for (i = 1; i <= multiple; i++) {
       strcpy (aux, msg);
       aux = aux + strlen (msg);
     }
     strcat (echo_constant, " ");

     for (i = 1; i <= times; i++) {
       nbytes = strlen (echo_msg);
       set_alarm (alarm_time);

       if (send (s, echo_msg, nbytes, 0) != nbytes) {
         fprintf (stderr, "tcpecho: Unable to send all bytes0);
         close_comms ();
         exit (1);
       }



Freniche                      Informational                    [Page 14]


       pending_bytes = nbytes;
       while (pending_bytes > 0) {
         if ((nbytes = recv (s, echo_msg, pending_bytes, 0)) <= 0) {
        fprintf (stderr, "tcpecho: Error reading echo from server0);
        close_comms ();
        exit (1);
         } else {
        pending_bytes = pending_bytes - nbytes;
        echo_msg [nbytes] = ' ';
         }
       }
     }
     set_alarm (0);
     shutdown (s, 1);
     return 0;
   }

   void print_usage () {
     fprintf
       (stderr,
        "tcpecho: [-n times -a alarm -m multiple] remote_host string0);
     exit (1);
   }


   int main (int argc, char *argv[])
   {
     int c;
     int times      = 1;
     int alarm_time = 20;
     int status     = 0;
     int multiple   = 0;
     char *remote_host;

     if (argc <= 1) {
       print_usage ();
     }

     while ((c = getopt (argc, argv, "n:m:a:")) != -1) {
       switch (c) {
       case 'a':
         alarm_time = atoi(optarg);
         break;
       case 'n':
         times = atoi (optarg);
         if (times < 1) times = 1;
         break;
       case 'm':



Freniche                      Informational                    [Page 15]


         multiple = atoi (optarg);
         if (multiple < 1) multiple = 0;
         break;
       }
     }

     if ((argc - optind) < 2) {
       print_usage ();
     }

     remote_host = argv [optind];
     optind ++;

     signal (SIGINT, close_comms);
     signal (SIGALRM, timeout);

     status = process_by_tcp (remote_host, argv [optind],
                     multiple, times, alarm_time);
     exit (status);
   }































Freniche                      Informational                    [Page 16]


Appendix 2: Trace of a Connection in Deadlock,

   Trace of a local connection where the phenomenon occurs. To reproduce
   it, enable the inetd echo service, compile tcpecho.c, launch a second
   xterm, and execute in it:

   localhost> tcpdump -N -p -i lo0 -s 128 -S

   Now, in the first xterm, execute (adjust conveniently the payload,
   observe the mss on the local interface):

   localhost> tcpecho -t -n 1 -a 120 -m 300000 localhost A

   The trace has been edited to remove some unnecessary fields and
   aligning the remaining.

   server> tcpdump -N -p -i lo0 -s 128
   tcpdump: listening on lo0
   4:26.637 1026 > echo: S 0:0(0) win 16384 <mss 16344>
   4:26.637 echo > 1026: S 0:0(0) ack 1 win 57344 <mss 16344>
   4:26.637 1026 > echo: . ack 1 win 57344
   4:26.781 1026 > echo: P 1:2049(2048) ack 1 win 57344
   4:26.782 1026 > echo: P 2049:16385(14336) ack 1 win 57344
   4:26.782 1026 > echo: P 16385:30721(14336) ack 1 win 57344
   4:26.783 1026 > echo: P 30721:45057(14336) ack 1 win 57344
   4:26.784 echo > 1026: P 1:2049(2048) ack 45057 win 20480
   4:26.784 1026 > echo: P 45057:57345(12288) ack 2049 win 55296
   4:26.784 echo > 1026: P 2049:4097(2048) ack 57345 win 8192
   4:26.785 1026 > echo: P 57345:59393(2048) ack 4097 win 53248
   4:26.785 echo > 1026: P 4097:8193(4096) ack 59393 win 6144
   4:26.785 1026 > echo: P 59393:61441(2048) ack 8193 win 49152
   4:26.786 echo > 1026: P 8193:10241(2048) ack 61441 win 4096
   4:26.787 echo > 1026: P 10241:24577(14336) ack 61441 win 20480
   4:26.787 1026 > echo: . 61441:75777(14336) ack 24577 win 32768
   4:26.788 echo > 1026: P 24577:26625(2048) ack 75777 win 14336
   4:26.788 1026 > echo: . 75777:90113(14336) ack 26625 win 30720
   4:26.788 echo > 1026: P 26625:28673(2048) ack 90113 win 0
   4:26.790 echo > 1026: P 28673:43009(14336) ack 90113 win 16384
   4:26.790 1026 > echo: . 90113:104449(14336) ack 43009 win 14336
   4:26.790 echo > 1026: P 43009:45057(2048) ack 104449 win 2048
   4:26.792 echo > 1026: . 45057:57345(12288) ack 104449 win 34816
   4:26.792 1026 > echo: . 104449:118785(14336) ack 57345 win 0
   4:26.792 1026 > echo: . 118785:133121(14336) ack 57345 win 0
   4:26.794 echo > 1026: . ack 133121 win 38912
   4:26.794 1026 > echo: . 133121:147457(14336) ack 57345 win 0
   4:26.794 1026 > echo: P 147457:161793(14336) ack 57345 win 0
   4:26.951 echo > 1026: . ack 161793 win 18432
   4:26.951 1026 > echo: . 161793:176129(14336) ack 57345 win 0



Freniche                      Informational                    [Page 17]


   4:27.151 echo > 1026: . ack 176129 win 4096
   4:31.451 echo > 1026: . 57345:57346(1) ack 176129 win 4096
   4:31.451 1026 > echo: . 176129:180225(4096) ack 57345 win 0
   4:31.551 echo > 1026: . ack 180225 win 0
   4:36.451 1026 > echo: . 180225:180226(1) ack 57345 win 0
   4:36.451 echo > 1026: . ack 180225 win 0
   4:38.451 echo > 1026: . 57345:57346(1) ack 180225 win 0
   4:38.451 1026 > echo: . ack 57345 win 0
   4:42.451 1026 > echo: . 180225:180226(1) ack 57345 win 0
   4:42.451 echo > 1026: . ack 180225 win 0
   4:52.451 echo > 1026: . 57345:57346(1) ack 180225 win 0
   4:52.451 1026 > echo: . ack 57345 win 0
   4:54.451 1026 > echo: . 180225:180226(1) ack 57345 win 0
   4:54.451 echo > 1026: . ack 180225 win 0
   5:18.451 1026 > echo: . 180225:180226(1) ack 57345 win 0
   5:18.451 echo > 1026: . ack 180225 win 0
   5:20.451 echo > 1026: . 57345:57346(1) ack 180225 win 0
   5:20.451 1026 > echo: . ack 57345 win 0
   6:06.451 1026 > echo: . 180225:180226(1) ack 57345 win 0
   6:06.451 echo > 1026: . ack 180225 win 0
   6:16.451 echo > 1026: . 57345:57346(1) ack 180225 win 0
   6:16.451 1026 > echo: . ack 57345 win 0
   6:26.791 1026 > echo: R 180225:180225(0) ack 57345 win 0

   The connection is aborted by the alarm used for socket operations. If
   such alarm is set sufficient high, all retransmissions of window
   probes could have been seen, and then again the reset.
























Freniche                      Informational                    [Page 18]


Appendix 3: Trace of a Connection Solving the Deadlock.

   In a client that has implemented the modification, execute

   client> tcpecho -t -n 1 -a 120 -m 60000 server A

   server> tcpdump -N -p -i tun0 -s 128
   tcpdump: listening on tun0
   9:36.118 49152 > echo: S 0:0(0) win 11680 <mss 1460>
   9:36.119 echo > 49152: S 0:0(0) ack 1 win 17520 <mss 1460>
   9:36.121 49152 > echo: . ack 1 win 11680
   9:36.130 49152 > echo: . 1:1461(1460) ack 1 win 11680
   9:36.130 echo > 49152: P 1:1461(1460) ack 1461 win 17520
   9:36.131 49152 > echo: . 1461:2921(1460) ack 1 win 11680
   9:36.131 echo > 49152: P 1461:2921(1460) ack 2921 win 17520
   9:36.132 49152 > echo: . 2921:4381(1460) ack 1 win 11680
   9:36.132 echo > 49152: P 2921:4381(1460) ack 4381 win 17520
   9:36.133 49152 > echo: . 4381:5841(1460) ack 1 win 11680
   9:36.133 echo > 49152: P 4381:5841(1460) ack 5841 win 17520
   9:36.134 49152 > echo: . 5841:7301(1460) ack 1 win 11680
   9:36.134 echo > 49152: P 5841:7301(1460) ack 7301 win 17520
   9:36.135 49152 > echo: . 7301:8761(1460) ack 1 win 11680
   9:36.135 echo > 49152: P 7301:8761(1460) ack 8761 win 17520
   9:36.136 49152 > echo: . 8761:10221(1460) ack 1 win 11680
   9:36.136 echo > 49152: P 8761:10221(1460) ack 10221 win 17520
   9:36.136 49152 > echo: P 10221:11681(1460) ack 1 win 11680
   9:36.137 echo > 49152: P 10221:11681(1460) ack 11681 win 17520
   9:36.139 49152 > echo: P 11681:13141(1460) ack 1461 win 10220
   9:36.151 echo > 49152: . ack 13141 win 17520
   9:36.168 49152 > echo: P 13141:14601(1460) ack 2921 win 8760
   9:36.170 49152 > echo: P 14601:16061(1460) ack 4381 win 7300
   9:36.171 echo > 49152: . ack 16061 win 17520
   9:36.173 49152 > echo: P 16061:17521(1460) ack 5841 win 5840
   9:36.175 49152 > echo: P 17521:18981(1460) ack 7301 win 4380
   9:36.176 echo > 49152: . ack 18981 win 17520
   9:36.178 49152 > echo: P 18981:20441(1460) ack 8761 win 2920
   9:36.180 49152 > echo: P 20441:21901(1460) ack 10221 win 1460
   9:36.180 echo > 49152: . ack 21901 win 17520
   9:36.183 49152 > echo: P 21901:23361(1460) ack 11681 win 0
   9:36.185 49152 > echo: P 23361:24821(1460) ack 11681 win 0
   9:36.185 echo > 49152: . ack 24821 win 17520
   9:36.188 49152 > echo: . 24821:26281(1460) ack 11681 win 0
   9:36.188 49152 > echo: P 26281:27741(1460) ack 11681 win 0
   9:36.189 echo > 49152: . ack 27741 win 17520
   9:36.191 49152 > echo: . 27741:29201(1460) ack 11681 win 0
   9:36.191 49152 > echo: P 29201:30661(1460) ack 11681 win 0
   9:36.192 echo > 49152: . ack 30661 win 17520
   9:36.194 49152 > echo: . 30661:32121(1460) ack 11681 win 0



Freniche                      Informational                    [Page 19]


   9:36.194 49152 > echo: P 32121:33581(1460) ack 11681 win 0
   9:36.196 49152 > echo: . 33581:35041(1460) ack 11681 win 0
   9:36.197 49152 > echo: P 35041:36501(1460) ack 11681 win 0
   9:36.199 49152 > echo: . 36501:37961(1460) ack 11681 win 0
   9:36.200 49152 > echo: P 37961:39421(1460) ack 11681 win 0
   9:36.202 49152 > echo: . 39421:40881(1460) ack 11681 win 0
   9:36.202 49152 > echo: P 40881:42341(1460) ack 11681 win 0
   9:36.351 echo > 49152: . ack 42341 win 5840
   9:36.353 49152 > echo: . 42341:43801(1460) ack 11681 win 0
   9:36.354 49152 > echo: . 43801:45261(1460) ack 11681 win 0
   9:36.355 49152 > echo: . 45261:46721(1460) ack 11681 win 0
   9:36.355 49152 > echo: . 46721:48181(1460) ack 11681 win 0
   9:36.551 echo > 49152: . ack 48181 win 0
   9:36.554 49152 > echo: . ack 11681 win 11680
   9:36.554 echo > 49152: . 11681:13141(1460) ack 48181 win 0
   9:36.554 echo > 49152: . 13141:14601(1460) ack 48181 win 0
   9:36.555 echo > 49152: . 14601:16061(1460) ack 48181 win 0
   9:36.555 echo > 49152: . 16061:17521(1460) ack 48181 win 0
   9:36.555 echo > 49152: . 17521:18981(1460) ack 48181 win 0
   9:36.555 echo > 49152: . 18981:20441(1460) ack 48181 win 0
   9:36.555 echo > 49152: . 20441:21901(1460) ack 48181 win 0
   9:36.555 echo > 49152: . 21901:23361(1460) ack 48181 win 0
   9:36.558 49152 > echo: . ack 14601 win 8760
   9:36.559 echo > 49152: . ack 48181 win 8192
   9:36.645 49152 > echo: . ack 17521 win 5840
   9:36.655 49152 > echo: . ack 20441 win 2920
   9:36.658 49152 > echo: . ack 23361 win 0
   9:36.659 echo > 49152: . ack 48181 win 16384
   9:36.661 49152 > echo: . 48181:49641(1460) ack 23361 win 0
   9:36.661 49152 > echo: . 49641:51101(1460) ack 23361 win 0
   9:36.662 49152 > echo: . 51101:52561(1460) ack 23361 win 0
   9:36.663 49152 > echo: . 52561:54021(1460) ack 23361 win 0
   9:36.663 49152 > echo: . 54021:55481(1460) ack 23361 win 0
   9:36.665 49152 > echo: . 55481:56941(1460) ack 23361 win 0
   9:36.666 49152 > echo: . 56941:58401(1460) ack 23361 win 0
   9:36.667 49152 > echo: P 58401:59861(1460) ack 23361 win 0
   9:36.669 49152 > echo: . ack 23361 win 11680
   9:36.669 echo > 49152: . 23361:24821(1460) ack 59861 win 4704
   9:36.669 echo > 49152: . 24821:26281(1460) ack 59861 win 4704
   9:36.669 echo > 49152: . 26281:27741(1460) ack 59861 win 4704
   9:36.669 echo > 49152: . 27741:29201(1460) ack 59861 win 4704
   9:36.669 echo > 49152: . 29201:30661(1460) ack 59861 win 4704
   9:36.669 echo > 49152: . 30661:32121(1460) ack 59861 win 4704
   9:36.669 echo > 49152: . 32121:33581(1460) ack 59861 win 4704
   9:36.670 echo > 49152: . 33581:35041(1460) ack 59861 win 4704
   9:36.682 49152 > echo: . ack 26281 win 8760
   9:36.697 49152 > echo: . ack 29201 win 5840
   9:36.700 49152 > echo: . ack 32121 win 2920



Freniche                      Informational                    [Page 20]


   9:36.701 echo > 49152: . ack 59861 win 12896
   9:36.704 49152 > echo: . ack 35041 win 0
   9:36.707 49152 > echo: . ack 35041 win 11680
   9:36.707 echo > 49152: . 35041:36501(1460) ack 59861 win 12896
   9:36.707 echo > 49152: . 36501:37961(1460) ack 59861 win 12896
   9:36.707 echo > 49152: . 37961:39421(1460) ack 59861 win 12896
   9:36.708 echo > 49152: . 39421:40881(1460) ack 59861 win 12896
   9:36.708 echo > 49152: . 40881:42341(1460) ack 59861 win 12896
   9:36.708 echo > 49152: . 42341:43801(1460) ack 59861 win 12896
   9:36.708 echo > 49152: . 43801:45261(1460) ack 59861 win 12896
   9:36.708 echo > 49152: . 45261:46721(1460) ack 59861 win 12896
   9:36.757 49152 > echo: . ack 37961 win 8760
   9:36.758 echo > 49152: . ack 59861 win 17520
   9:36.761 49152 > echo: . ack 40881 win 5840
   9:36.764 49152 > echo: . ack 43801 win 2920
   9:36.767 49152 > echo: . ack 46721 win 0
   9:36.788 49152 > echo: . ack 46721 win 11680
   9:36.788 echo > 49152: . 46721:48181(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 48181:49641(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 49641:51101(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 51101:52561(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 52561:54021(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 54021:55481(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 55481:56941(1460) ack 59861 win 17520
   9:36.788 echo > 49152: . 56941:58401(1460) ack 59861 win 17520
   9:36.792 49152 > echo: . ack 49641 win 8760
   9:36.795 49152 > echo: . ack 52561 win 5840
   9:36.798 49152 > echo: . ack 55481 win 2920
   9:36.801 49152 > echo: . ack 58401 win 0
   9:36.803 49152 > echo: . ack 58401 win 11680
   9:36.803 echo > 49152: P 58401:59861(1460) ack 59861 win 17520
   9:36.807 49152 > echo: P 59861:60001(140) ack 59861 win 11680
   9:36.808 echo > 49152: P 59861:59961(100) ack 60001 win 17520
   9:37.131 49152 > echo: . ack 59961 win 11680
   9:37.131 echo > 49152: P 59961:60001(40) ack 60001 win 17520
   9:37.135 49152 > echo: F 60001:60001(0) ack 60001 win 11680
   9:37.135 echo > 49152: . ack 60002 win 17520
   9:37.136 echo > 49152: F 60001:60001(0) ack 60002 win 17520
   9:37.153 49152 > echo: . ack 60002 win 11679

   After the client sends a window probe, its application is awakened
   and reads the complete receive buffer (11680, at time 9:36.554). A
   window advertisement is send to the server, inviting it to respond
   with more data, avoiding the deadlock.







Freniche                      Informational                    [Page 21]

Full Copyright Statement.

   Copyright (C) The Internet Society (1998). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

INTERNET DRAFT                  EXPIRES JANUARY 1999    INTERNET DRAFT