Network Working Group                                          N. Demizu
Internet-Draft                                                      NICT
Expires: September 3, 2006                                 March 3, 2006


                TS2 --- A Modified TCP Timestamps Mechanism
                       <draft-demizu-tcp-ts2-01.txt>


Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than a "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright Notice

   Copyright (C) The Internet Society (2006).  All Rights Reserved.


















Demizu                   Expires September 2006                 [Page 1]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Abstract

   This memo proposes a modified TCP Timestamps mechanism called "TS2".
   It uses the existing "TCP Timestamps option" specified in RFC1323 and
   a new TCP option called "the TCP Old Timestamps option", which is
   specified in this memo.  As a fallback, an RFC1323-compatible mode
   called "TS1" is also available.

   The base mechanism of TS2 includes the definitions of those two TCP
   Timestamps options, mode negotiation to enable TS1 or TS2, and a rule
   for updating internal states.  The applied mechanisms of TS2 include
   an accurate RTT measurement mechanism that is correct even for
   duplicate ACK segments (RTTM/TS2), a reordering-robust mechanism to
   detect wrapped sequence numbers (PAWS/TS2), a lightweight mechanism
   to detect spoofed segments (PASA/TS2), a loss inference mechanism
   applicable to both original and retransmitted data segments
   (DLI/TS2), and a spurious loss inference detection mechanism that
   operates without waiting for one RTT by sending arbitrary in-window
   data (SLID/TS2).


Table of Contents

   1. Introduction ................................................... 3
   2. Terminology .................................................... 3
   3. Two TCP Timestamps Options ..................................... 5
   4. Base Mechanism ................................................. 7
   5. RTTM (Round Trip Time Measurement) ............................ 12
   6. PAWS (Protection Against Wrapped Sequence numbers) ............ 14
   7. PASA (Protection Against Spoofing Attacks) .................... 17
   8. DLI (Data Loss Inference) ..................................... 24
   9. SLID (Spurious Loss Inference Detection) ...................... 30
   10. Security Considerations ...................................... 39
   11. IANA Considerations .......................................... 39
   12. Acknowledgements ............................................. 39
   13. References ................................................... 39
   Author's Address ................................................. 42
   Appendix A: TS2 Reference ........................................ 43
   Appendix B: Granularity of Timestamps ............................ 69
   Appendix C: Loss Inference With SACK and DLI/TS2 ................. 70
   Appendix D: Summary of TCP Timestamps Option in RFC1323 .......... 75
   Appendix E: Issues with TCP Timestamps Option in RFC1323 ......... 79
   Appendix F: Problem of PAWS in RFC1323 and Reordering ............ 82
   Appendix G: Alternative Ideas .................................... 87
   Appendix H: Changes from -00 version. ............................ 90
   Copyright Statement and Intellectual Property .................... 91





Demizu                   Expires September 2006                 [Page 2]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


1. Introduction

   This memo proposes a modified TCP Timestamps mechanism called "TS2".
   It uses the existing "TCP Timestamps option" [RFC1323] and a new TCP
   option called "the TCP Old Timestamps option", which is specified in
   this memo.  The significant differences between TS2 and the TCP
   Timestamps option specified in [RFC1323] are the rule to determine
   which timestamp is echoed and the timestamp unit.  In addition, TS2
   solves the issues with the existing TCP Timestamps option specified
   in [RFC1323], as described in appendix E.

   As a fallback, RFC1323-compatible mode called "TS1" is also
   available.  The use of TS1 or TS2 is negotiated using the two options
   on SYN and SYN+ACK segments in the TCP three-way handshake phase.

   TS2 enables several applied mechanisms, as follows.  When TS2 is
   enabled on a TCP connection, a local node MAY enable one or more of
   these mechanisms on the TCP connection without additional negotiation
   with a remote node.

      - RTTM/TS2 (Round Trip Time Measurement with TS2) enables correct
        RTT measurements even when a duplicate ACK segment is received.

      - PAWS/TS2 (Protection Against Wrapped Sequence numbers with TS2)
        is a reordering-robust protection mechanism for wrapped sequence
        numbers.

      - PASA/TS2 (Protection Against Spoofing Attacks with TS2) is a
        lightweight protection mechanism against spoofing attacks that
        inject faked SYN, data, FIN, and RST segments.

      - DLI/TS2 (Data Loss Inference with TS2) infers losses of both
        original and retransmitted data segments.

      - SLID/TS2 (Spurious Loss Inference Detection with TS2) detects
        spurious loss inference without waiting for one RTT by sending
        arbitrary in-window data.

   Note:The procedures described in this memo have not been demonstrated
   by simulation nor by implementation.


2. Terminology

2.1 General

   This memo uses the same variable names and TCP state names defined in
   section 3.2 of [RFC793].  In addition, it introduces the following
   variables and notations: SND.MAX holds the maximum value of SND.NXT;


Demizu                   Expires September 2006                 [Page 3]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   SSEG.XXX means the XXX field on the segment being sent; and RSEG.XXX
   means the XXX field on the segment just received.

   The memo uses a variable "SND.FACK" [MM96][MSM99], which holds the
   highest sequence number known to have reached the receiver, plus one.
   An octet is "SACKed" if the octet has been reported in a TCP SACK
   option [RFC2018].

   The memo uses the following abbreviations defined in [RFC2581]: SMSS
   (Sender Maximum Segment Size), and RMSS (Receiver Maximum Segment
   Size).

   The memo uses the following abbreviations defined in [RFC2988]: RTO
   (Retransmission TimeOut), RTT (Round-Trip Time), SRTT (Smoothed RTT),
   and RTTVAR (RTT VARiation).

   According to [RFC793], SEG.LEN includes the SYN and FIN bits.  Thus,
   segments satisfying (RSEG.LEN > 0) include data, SYN, and FIN
   segments.  If a RST segment has data, this memo does not consider
   that it satisfies (RSEG.LEN > 0).  For simplicity, the term "data
   segments" often means "data, SYN, and/or FIN segments" in this memo.

   The memo refers to the initial transmission of an octet as the
   "original transmission", and to a subsequent transmission of the same
   octet as a "retransmission" [RFC3522][RFC4015].  In addition, a data
   segment for which part or all of its octets are sent by original
   transmission is referred to as an "original data segment".  Other
   data segments are referred to as "retransmitted data segments".

   All arithmetic dealing with TCP sequence numbers must be performed
   modulo 2^32.  A sequence is called "monotonically nondecreasing" if
   each number is greater than or equal to its predecessor.

2.2 Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in [RFC2119].

   This memo makes use of conceptual variables to describe behavior.
   The specific variable names, and how their values are referred to and
   changed, are provided here to demonstrate behavior.  Implementations
   are not required to follow the memo exactly, as long as its external
   behavior is consistent with that described here.







Demizu                   Expires September 2006                 [Page 4]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


3. Two TCP Timestamps Options

   TS2 uses two TCP options: "the TCP Timestamps option" specified in
   [RFC1323], and a new TCP option called "the TCP Old Timestamps
   option", as specified in this section.

3.1 TCP Timestamps Option

   Figure 3-1 shows the format of the TCP Timestamps option.
   For simplicity, it is hereafter called "the TS option".

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     |   Kind = 8    |  Length = 10  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSval (TS Value)                         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSecr (TS Echo Reply)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                         Figure 3-1: The TS option

3.2 TCP Old Timestamps Option

   The format of the TCP Old Timestamps option has two forms as given
   below.  The option-kind value is <<TBD>>.

   On SYN and SYN+ACK segments, the TCP Old Timestamps option consists
   of only two octets (option-kind and option-length), as shown in
   Figure 3-2.  The purpose of this form is to negotiate the use of TS2.
   For simplicity, this form is hereafter called "the OTS_OK option".

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Kind = <TBD>  |  Length = 2   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 3-2: The OTS_OK option

   On other segments, the format of the TCP Old Timestamps option is the
   same as that of the TS option, except for the option-kind value, as
   shown in Figure 3-3.  The purpose of this form is to inform a remote
   node that the TSecr value is not fresh (In contrast, the TSecr value
   in the TS option is fresh).  For simplicity, this form is hereafter
   called "the OTS option".




Demizu                   Expires September 2006                 [Page 5]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                     | Kind = <TBD>  |  Length = 10  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSval (TS Value)                         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSecr (TS Echo Reply)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Figure 3-3: The OTS option

3.3 TSval and TSecr Fields

   In both the TS option and the OTS option, the TSval field contains
   the current external timestamp, while the TSecr field contains the
   TS.Recent value, which is updated by the received TSval values, as
   specified in the base mechanism section.

   When TS1 is enabled, the timestamp unit can be chosen between 1
   second and 1 ms (10^-3), in order to be interoperable with [RFC1323].
   When TS2 is enabled, the timestamp unit is fixed at 1 usec (10^-6).
   All arithmetic dealing with timestamps must be performed modulo 2^32.




























Demizu                   Expires September 2006                 [Page 6]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


4. Base Mechanism

   This section describes how the timestamp mode (i.e., none, TS1, or
   TS2) is negotiated, how the timestamps option kind is chosen (i.e.,
   the TS option or the OTS option), and how the values of SSEG.TSval
   and SSEG.TSecr on outgoing segments are computed.

4.1 Variables

   The base mechanism uses the following variables: TS.Req (integer),
   TS.Mode (integer), TS.SndOff (32bit-timestamp), TS.SndAdj
   (32bit-timestamp), TS.Recent (32bit-timestamp), TS.RecentIsOld
   (boolean), and Last.Ack.Sent (32bit-sequence-number).  Among these
   variables, only TS.Req, TS.Mode, and TS.SndOff are referred to by the
   applied mechanisms.

   TS.Req contains the requested mode (0 = none, 1 = TS1, or 2 = TS2) of
   a TCP connection to be established.  TS.Mode records the result of
   the mode negotiation.  Its initial value is negative, which means
   "negotiation has not been completed".

   TS.SndOff and TS.SndAdj help mode negotiation.  See section 4.3 for
   more details.

   TS.Recent holds the value to be echoed in the TSecr fields of both
   the TS option and the OTS option.  Its initial value is zero.
   TS.RecentIsOld is accessed only when TS2 is enabled.  It is true if
   any segment that satisfies (RSEG.LEN > 0) and carries the TS option
   or the OTS option has not been received after the TS.Recent value has
   last been echoed.

   Last.Ack.Sent holds the last SSEG.ACK value sent, which is equal to
   the maximum SSEG.ACK value sent.

   Note: TS.Recent and Last.Ack.Sent are inherited from [RFC1323].

4.2 Mode Negotiation

   This subsection describes the procedure of mode negotiation using the
   two TCP Timestamps options on SYN and SYN+ACK segments in the TCP
   three-way handshake phase.

   When a SYN segment is sent to establish a TCP connection, if TS2 is
   requested, the SYN segment SHOULD carry both the TS option and the
   OTS_OK option.  If TS1 is requested, the SYN segment SHOULD carry the
   TS option, and it MUST NOT carry the OTS_OK option.  If neither TS1
   nor TS2 is requested, the SYN segment MUST NOT carry the TS option
   nor the OTS_OK option.



Demizu                   Expires September 2006                 [Page 7]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   If a SYN (without ACK) segment is received in the LISTEN or SYN-SENT
   state, and if the received segment does not carry one or both of the
   TS option and the OTS_OK option, the SYN+ACK segment sent in reply
   MUST NOT carry the OTS_OK option.  Similarly, if the received segment
   does not carry the TS option, the SYN+ACK segment sent in reply MUST
   NOT carry the TS option nor the OTS_OK option.

   When a SYN segment is received in the LISTEN or SYN-SENT state, if
   TS2 is requested, the SYN+ACK segment sent in reply SHOULD carry both
   the TS option and the OTS_OK option as long as the above rule allows.
   If TS1 is requested, the SYN+ACK segment sent in reply SHOULD carry
   the TS option as long as the above rule allows, and it MUST NOT carry
   the OTS_OK option.  If neither TS1 nor TS2 is requested, the SYN
   segment MUST NOT carry the TS option and the OTS_OK option.

   On SYN and SYN+ACK segments in the TCP three-way handshake phase, if
   both the TS option and the OTS_OK option are exchanged, TS2 is
   enabled.  If the TS option is exchanged but the OTS_OK option is not
   exchanged, TS1 is enabled.  If the TS option is not exchanged,
   neither TS1 nor TS2 is enabled.  The result is recorded in TS.Mode.
   When TS2 is enabled, TS.RecentIsOld is set to false.

4.3 Internal Timestamp and External Timestamp

   In this memo, "internal timestamp" means a timestamp generated
   directly from a timestamp source such as an internal tick count or a
   real time clock.  "External timestamp" means a timestamp exchanged in
   the TSval and TSecr fields.  An external timestamp is calculated as
   an internal timestamp plus TS.SndOff.

   TS.SndOff supports mode negotiation in conjunction with TS.SndAdj,
   which is used only during mode negotiation.  Since the timestamp unit
   is different between TS1 and TS2, the timestamp values of TS1 and TS2
   are almost always different.  To save the TCP option space on SYN and
   SYN+ACK segments, however, only the TS option carries the TSval and
   TSecr fields.  To be interoperable with [RFC1323], these two fields
   contain the timestamps of TS1.  The purpose of TS.SndAdj is to adjust
   TS.SndOff to generate correct external timestamps for TS2 when TS2 is
   enabled.  That is, when the first SYN segment is sent, TS.SndAdj
   records the difference between the timestamps of TS1 and TS2.  Then,
   if TS2 is enabled when a SYN+ACK segment is received, TS.SndAdj is
   added to TS.SndOff.

   The following formula shows the reason why TS.SndAdj holds the
   difference between the timestamps of TS1 and TS2.






Demizu                   Expires September 2006                 [Page 8]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      CurTS(in TS2) = InitTS(in TS1) + ElapsedTS(in TS2)
                    = InitTS(in TS1) + (CurTS(in TS2)  - InitTS(in TS2))
                    = CurTS(in TS2)  + (InitTS(in TS1) - InitTS(in TS2))
                    = CurTS(in TS2)  + TS.SndAdj

   TS.SndOff is also used to avoid reusing the same range of TSval
   values when a TCP Control Block is reused.  When a TCP Control Block
   is created, TS.SndOff is simply set to zero.  On the other hand, when
   a TCP Control Block is reused, if TS2 is requested, the difference
   between the timestamps of TS2 and TS1 is added to TS.SndOff before
   the TCP three-way handshake phase begins to avoid reusing the same
   range of external timestamps when TS2 is enabled.  If TS2 is not
   requested, TS.SndOff is unchanged.

   In addition, TS.SndOff is used by PASA-DF/TS2 to randomize the
   initial timestamp values of TCP connections.  See section 7.1.1 for
   more details.

4.4 Input Processing

   This subsection describes the procedure for processing received
   segments.

   If TS1 or TS2 is enabled, and if a received segment carrying the TS
   option or the OTS option satisfies at least one of inequalities (1)
   and (2) below, it SHOULD be processed by the base mechanism and the
   applied mechanisms to update their variables.  Otherwise, those
   variables MUST NOT be updated, while the RSEG.TSval and RSEG.TSecr
   values on such a segment MAY be checked to test the received segment.

      (RSEG.LEN > 0 &&
       Last.Ack.Sent - max(RCV.WND) < RSEG.SEQ + RSEG.LEN &&
       RSEG.SEQ < RCV.NXT + RCV.WND) ............................ (1)

         or

      (RSEG.LEN == 0 &&
       Last.Ack.Sent - max(RCV.WND) <= RSEG.SEQ &&
       RSEG.SEQ < RCV.NXT + RCV.WND) ............................ (2)

   When TS1 is enabled, if a received segment other than a RST segment
   carries the TS option and satisfies all of inequality (1),
   (RSEG.SEQ <= Last.ACK.sent), and (RSEG.TSval > TS.Recent), then
   RSEG.TSval is recorded in TS.Recent.  In other words, TS.Recent holds
   the maximum RSEG.TSval value on in-sequence and duplicate data
   segments.  Note that TS.Recent is not updated by out-of-order data
   segments, while it is updated by in-sequence and duplicate data
   segments.  Also note that TS.Recent is monotonically nondecreasing.



Demizu                   Expires September 2006                 [Page 9]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   When TS2 is enabled, if a received segment other than a SYN or RST
   segment does not carry the TS option nor the OTS option, it MUST be
   dropped, and an ACK segment SHOULD be sent in reply.  If a received
   segment carries both the TS option and the OTS option, it MUST be
   dropped, and an ACK segment SHOULD be sent in reply.  Otherwise, if a
   received segment other than a RST segment carries the TS option or
   the OTS option and satisfies inequality (1), and if TS.RecentIsOld is
   true or (RSEG.TSval < TS.Recent) is satisfied, then RSEG.TSval is
   recorded in TS.Recent, and TS.RecentIsOld is set to false.  In other
   words, TS.Recent holds the minimum RSEG.TSval value on data segments
   received after a segment has last been sent.  In contrast with TS1,
   note that TS.Recent is updated by out-of-order data segments, as well
   as in-sequence and duplicate data segments.  Also note that TS.Recent
   is not monotonically nondecreasing.

   Note: The reason why SYN and RST segments are handled specially is to
   disconnect half-open TCP connections.

4.5 Output Processing

   This subsection describes the procedure for processing segments being
   sent.

   When a segment carries the TS option or the OTS option, SSEG.TSval
   contains the current external timestamp value, and SSEG.TSecr
   contains the TS.Recent value unless otherwise specified below
   (i.e., <SSEG.TSval=CurrentExternalTS><SSEG.TSecr=TS.Recent>).

   If TS1 is enabled, the rules below are followed.

      - When a segment other than a RST segment is sent, it SHOULD carry
        the TS option.

      - When a RST segment is sent, it SHOULD NOT carry the TS option.

      Note: The reason why RST segments SHOULD NOT carry the TS option
      is to be interoperable with [RFC1323], which states in section 4.2
      that "It is recommended that RST segments NOT carry timestamps,
      and that RST segments be acceptable regardless of their
      timestamp".

   If TS2 is enabled, the rules below are followed.

      - When a segment other than a RST segment is sent, if
        TS.RecentIsOld is false, the segment MUST carry the TS option,
        and TS.RecentIsOld is set to true.  In contrast, if
        TS.RecentIsOld is true, the segment MUST carry the OTS option.

      - When a RST segment is sent, it MUST carry the OTS option unless


Demizu                   Expires September 2006                [Page 10]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        otherwise specified below.

      - When a RST segment is sent in reply to a received segment
        because of [RFC793], the rule below is followed.

        If the received segment carries the TS option or the OTS option,
        TS2 may be enabled on the remote node.  Thus, to make the RST
        segment sent in reply acceptable to PAWS/TS2 and PASA/TS2 at the
        remote node, the RST segment MUST carry the OTS option, where
        SSEG.TSval is the RSEG.TSecr value and SSEG.TSecr is the
        RSEG.TSval value.
        (i.e., <SSEG.TSval=RSEG.TSecr><SSEG.TSecr=RSEG.TSval>)

        On the other hand, if the received segment does not carry the TS
        option nor the OTS option, TS1 may be enabled at the remote
        node, or neither of TS1 nor TS2 is enabled at the remote node.
        Thus, to make the RST segment sent in reply acceptable at the
        remote node in either case, the RST segment SHOULD NOT carry the
        TS option nor the OTS option.

      Note: The reason why RST segments are handled specially is to
      disconnect half-open TCP connections.

   For ACK segments sent in reply to invalid segments because of this
   memo, an ACK throttling mechanism SHOULD be implemented.


























Demizu                   Expires September 2006                [Page 11]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


5. RTTM (Round Trip Time Measurement)

   RTTM is an RTT measurement mechanism making use of the RSEG.TSecr
   field in the TS option on received segments.  It can be enabled when
   either TS1 or TS2 is enabled.  If a received segment does not satisfy
   inequality (1) nor (2), RTTM MUST NOT be performed on the segment.

   The most significant difference between RTTM/TS1 (RTTM with TS1) and
   RTTM/TS2 (RTTM with TS2) is that RTTM/TS1 takes RTT measurements only
   when SND.UNA is advanced, while RTTM/TS2 takes RTT measurements
   whenever the TS option is received.  Since both RTTM/TS1 and RTTM/TS2
   take RTT measurements even if a received ACK segment was sent in
   reply to a retransmitted data segment, they replace Karn's algorithm
   [KP87].

   Note: If a smoothed RTT is computed from many RTT measurements per
   RTT, the resulting SRTT, RTTVAR, and RTO values [RFC2988] would
   become short-sighted.  Implementations should take care of this
   issue.  The question of how to compute an RTO value from many
   measured RTTs is outside the scope of this memo.

5.1 RTTM/TS1

   If TS1 is enabled, when SND.UNA is advanced, the RTT can be
   calculated as follows:

      CurrentExternalTS(T1) - RSEG.TSecr + TS1_GRANULARITY

   where TS1_GRANULARITY is the timestamp granularity of TS1.

      Note: "TS1_GRANULARITY" in the above expression SHOULD NOT be
      replaced with "TS1_GRANULARITY/2" unless TS1_GRANULARITY is much
      lower than any possible RTT.

   Since there is a possibility that the RSEG.TSecr value is very old
   with TS1, the measured RTT may be longer than the real RTT in some
   corner cases.  See appendix E.1 for more details.

   Implementation Note: When the RSEG.TSecr value is zero, RTT SHOULD
   NOT be calculated.  The reason is that some implementations of the
   TCP Timestamps option [RFC1323] send zero in the TSecr field in some
   scenarios where zero is obviously a bogus timestamp value.

5.2 RTTM/TS2

   If TS2 is enabled, then whenever the TS option is received, the RTT
   can be calculated as follows:

      CurrentExternalTS(T2) - RSEG.TSecr + TS2_GRANULARITY


Demizu                   Expires September 2006                [Page 12]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   where TS2_GRANULARITY is the timestamp granularity of TS2.

      Note: "TS2_GRANULARITY" in the above expression SHOULD NOT be
      replaced with "TS2_GRANULARITY/2" unless TS2_GRANULARITY is much
      lower than any possible RTT.

   When the OTS option is received, RTT SHOULD NOT be calculated,
   because the RSEG.TSecr value in the OTS option may be very old by
   definition.










































Demizu                   Expires September 2006                [Page 13]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


6. PAWS (Protection Against Wrapped Sequence numbers)

   PAWS is a mechanism for detecting old duplicate segments by making
   use of the RSEG.TSval field in the TS option and the OTS option on
   the received segment.  It can be enabled when either TS1 or TS2 is
   enabled.

   As described in appendix F, there is a possibility that a legitimate
   data segment could be discarded by PAWS in RFC1323 when it is delayed
   because of reordering.  In contrast, as described in appendix E.2,
   PAWS/TS1 (PAWS with TS1) is slightly robust against reordering.  And,
   PAWS/TS2 (PAWS with TS2) is robust against reordering, so that
   legitimate segments are unlikely to be discarded even when delayed
   because of reordering.

   PAWS/TS1 and PAWS/TS2 use two variables: TS.RcvMin (32bit-timestamp)
   and TS.RcvMin_time (internal-time).  TS.RcvMin holds the maximum
   value of the received RSEG.TSval values in both the TS option and the
   OTS option.  TS.RcvMin_time holds the last time when a segment
   satisfying (RSEG.TSval >= TS.RcvMin) was received.  The value of
   TS.RcvMin is valid for a limited amount of time depending on TS.Mode.
   If a received segment does not satisfy inequality (1) nor (2), these
   variables MUST NOT be updated.

      Note: TS.RcvMin is updated by any segment, while TS.Recent is
      updated only by segments satisfying (RSEG.LEN > 0).  In addition,
      TS.RcvMin is always monotonically nondecreasing, in contrast to
      TS.Recent with TS2.

   When the value of TS.RcvMin is valid, all received segments SHOULD be
   tested using TS.RcvMin, as described in the following subsections,
   before the acceptability test of [RFC793].  This test is called the
   PAWS test in this memo.  To avoid discarding legitimate delayed
   segments due to reordering, the lower bound of acceptable RSEG.TSval
   values is chosen as slightly lower than TS.RcvMin, as suggested in
   the appendix F.5.

      Note: PAWS/TS1 and PAWS/TS2 use the dedicated variable TS.RcvMin
      for the PAWS test, while PAWS in [RFC1323] uses a shared variable
      TS.Recent.

6.1 PAWS/TS1

   When TS1 is enabled, the minimum acceptable RSEG.TSval value is
   (TS.RcvMin - TS1_PAWS_MARGIN), where TS1_PAWS_MARGIN is a margin.
   Its appropriate value, such as RTO value, cannot be computed,
   however, because the unit of the received timestamp is unknown.
   Hence, this memo recommends TS1_PAWS_MARGIN = 1 simply because it is
   better than zero.


Demizu                   Expires September 2006                [Page 14]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   Thus, when the value of TS.RcvMin is valid, if a received segment
   other than a SYN or RST segment carries the TS option, it MUST
   satisfy (RSEG.TSval >= TS.RcvMin - TS1_PAWS_MARGIN).  If it does not
   satisfy this inequality, it MUST be dropped, and an ACK segment with
   the TS option SHOULD be sent in reply.

      Note: The reason why SYN and RST segments are not tested is to
      disconnect half-open TCP connections.  Another reason why RST
      segments are not tested is to be interoperable with [RFC1323],
      which states in section 4.2 that "It is recommended that RST
      segments NOT carry timestamps, and that RST segments be acceptable
      regardless of their timestamp".

   The value of TS.RcvMin is valid until the internal clock reaches
   (TS.RcvMin_time + TS1_PAWS_IDLE).  TS1_PAWS_IDLE should be longer
   than the longest timeout, and it should be reasonably less than 2^31.
   The default value of TS1_PAWS_IDLE is 24 days, which is the same
   value specified in [RFC1323].

6.2 PAWS/TS2

6.2.1 Minimum Acceptable TSval Value

   When TS2 is enabled, the minimum acceptable RSEG.TSval value is
   (TS.RcvMin - CurRTO), where CurRTO means the current RTO value.
   Thus, when the value of TS.RcvMin is valid, all received legitimate
   segments must satisfy (RSEG.TSval >= TS.RcvMin - CurRTO).  If a
   received segment other than a SYN or RST segment does not satisfy
   this inequality, it MUST be dropped, and an ACK segment SHOULD be
   sent in reply.  In addition, if a received RST segment with the OTS
   option does not satisfy this inequality, it MUST be dropped silently.

      Note: The reason why RST segments without the OTS option and SYN
      segments are not tested here is to disconnect half-open TCP
      connections.

   The value of TS.RcvMin is valid until the internal clock reaches
   (TS.RcvMin_time + TS2_PAWS_IDLE).  TS2_PAWS_IDLE should be longer
   than the longest timeout, and it should be reasonably less than 2^31.
   The default value of TS2_PAWS_IDLE is 20 minutes (= 1200 seconds).
   (Note: 2^31 / 1000000 = 2147 seconds.)

      Note 1: PAWS/TS2 assumes that RTTs measured at a local node and
      RTTs measured at a remote node are almost the same.  Since the
      timestamp unit is fixed, the RTO value in the inequality of
      PAWS/TS2 can be evaluated under this assumption.  This assumption
      might be wrong in some asymmetric networks.  In addition, in a
      unidirectional data flow, a data receiver can take only one RTT
      measurement only when a SYN segment is exchanged.  In such cases,


Demizu                   Expires September 2006                [Page 15]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      the robustness against reordering may be poor.  Nevertheless, it
      would not be worse than that of PAWS in [RFC1323].

      Note 2: An RTT may suddenly increase due to congestion, route
      changes, link-bandwidth changes, etc.  Hence, the computation of
      RTO values should be done in a conservative manner.

6.2.2 Maximum Acceptable TSval Value

   If the value of TS.RcvMin is valid, because the timestamp unit is
   fixed, the maximum acceptable value of RSEG.TSval can be calculated
   using TS.RcvMin and TS.RcvMin_time as follows.

      TS.RcvMin + time2ts(CurrentTime - TS.RcvMin_time) + TS2_PAWS_DEV

   where TS2_PAWS_DEV is the maximum acceptable deviation.

   When a segment other than a RST segment without the OTS option or a
   SYN segment is received, if its RSEG.TSval value is greater than this
   maximum bound, the segment MAY be dropped.  But if PASA-DF/TS2 is
   enabled, it SHOULD be dropped.  If it is dropped, and if it is not a
   RST segment, an ACK segment SHOULD be sent in reply.

   Note: The same expression with the replacement of "+ TS2_PAWS_DEV"
   with "- TS2_PAWS_DEV" cannot be applied to the minimum bound, because
   the RSEG.TSval values may be tweaked to lower values by PASA-DF/TS2
   at the remote node.
























Demizu                   Expires September 2006                [Page 16]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


7. PASA (Protection Against Spoofing Attacks)

   PASA is a lightweight mechanism for protecting TCP connections
   against spoofing attacks injecting faked SYN, data, FIN, and RST
   segments.  PASA can be enabled when TS2 is enabled.  PASA does not
   work with TS1.

   PASA/TS2 (PASA with TS2) consists of two parts.  One is called
   PASA-DF/TS2 (PASA for Data and FIN segments with TS2).  It detects
   spoofed data and FIN segments with the TS option or the OTS option by
   making use of received RSEG.TSecr values.  It also detects spoofed
   RST segments with the OTS option by applying the same test.  The
   other part is called PASA-SR/TS2 (PASA for SYN and RST segments with
   TS2).  It enables both genuine RST segments without the OTS option
   and genuine SYN segments to trigger disconnection of their TCP
   connections, while spoofed segments are not allowed to trigger such
   disconnection.  Since PASA-DF/TS2 and PASA-SR/TS2 are independent of
   each other, an implementation MAY support one or both of them.

   Along with PASA, if the timestamp granularity is longer than 1 usec
   but methods described in appendix B are not implemented, lower bits
   of timestamps can be used as nonce bits to obfuscate timestamps.

7.1 PASA-DF/TS2 (PASA for Data and FIN Segments with TS2)

   This subsection describes a mechanism called PASA-DF/TS2 that detects
   spoofed data and FIN segments with the TS option or the OTS option by
   making use of received RSEG.TSecr values.  It also detects spoofed
   RST segments with the OTS option by applying the same test.

   If a received segment does not satisfy inequality (1) nor (2), it
   MUST NOT update the variables of PASA-DF/TS2.  In contrast, if
   PASA-DF/TS2 is enabled, any received segment SHOULD be tested by
   PASA-DF/TS2 before the acceptability test of [RFC793].

   PASA-DF/TS2 uses four variables: TS.SndMin (32bit-timestamp),
   TS.SndMax (32bit-timestamp), TS.SndMax_time (internal-time), and
   TS.PASADF_On (boolean).

   TS.SndMin holds the maximum value of the RSEG.TSecr field in the TS
   option and the OTS option on received segments satisfying inequality
   (1) or (2), while TS.SndMax holds the maximum value of the SSEG.TSval
   field on sent segments satisfying (SSEG.LEN > 0).  For the first SYN
   or SYN+ACK segment sent, SSEG.TSval is copied to both TS.SndMin and
   TS.SndMax.  Consequently, the received RSEG.TSecr values of the
   established TCP connection should be in the range from around
   TS.SndMin to TS.SndMax.  The test of whether the received segments
   fit in this range is called the PASA-DF test in this memo.
   TS.SndMax_time holds the latest time when a segment satisfying


Demizu                   Expires September 2006                [Page 17]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   (SSEG.LEN > 0) is sent or when TS.SndOff is updated.  TS.PASADF_On
   indicates whether the PASA-DF test can be performed.  The initial
   value of TS.PASADF_On is true.

   Note: The reason why PASA-DF/TS2 does not test RST segments without
   the OTS option and SYN segments is to disconnect half-open TCP
   connections.  They are tested by PASA-SR/TS2, as described later.

   When PASA-DF/TS2 is enabled, the maximum acceptable value of
   RSEG.TSval SHOULD be tested by PAWS/TS2.  (See section 6.2.2)

7.1.1 External Timestamp Values

   When PASA-DF/TS2 is enabled, TS.SndOff is utilized to randomize the
   initial SSEG.TSval value in order to obfuscate external timestamp
   values.  If a TCP control block is reused by a new TCP connection,
   TS.SndOff MUST be increased by a random number in the range from 0 to
   TS2_PASADF_RNDMAX_REUSE, whose default value is 2^29 - 1 usec (about
   9 minutes).  In other cases, a newly generated 32-bit random number
   MUST be copied to TS.SndOff.

   TS.SndOff is also utilized to minimize the difference between
   TS.SndMin and TS.SndMax after a long idle period.  Since the
   possibility of accepting spoofed segments is the difference in 2^32,
   it is important to keep the difference small.  Therefore, the
   advancement of TS.SndMax must have an upper bound.  In addition,
   SSEG.TSval MUST be monotonically nondecreasing in order to make PAWS
   operable at a remote node.  To satisfy these requirements, this memo
   proposes that the advancement of SSEG.TSval be no greater than
   TS2_PASADF_MAXADV, whose default value is 64 seconds.  TS.SndOff MUST
   be tweaked as follows:

      over_time = (CurrentTime - TS.SndMax_time) - TS2_PASADF_MAXADV;
      if (over_time > 0) {
         TS.SndOff -= time2ts(over_time)
         TS.SndOff += RandomNumber(TS2_PASADF_RNDMAX_IDLE);
         TS.SndMax_time = CurrentTime;
      }

      Note 1: This memo assumes that CurrentTime is not wrapped in the
      lifetime of any TCP connections.

      Note 2: RandomNumber(TS2_PASADF_RNDMAX_IDLE) above means a random
      number in the range from 0 to TS2_PASADF_RNDMAX_IDLE, whose
      default value is 2^26 - 1 usec (about 67 seconds).

7.1.2 Temporarily Suspension of Tests

   There is a possibility that (TS.SndMax - TS.SndMin) becomes negative


Demizu                   Expires September 2006                [Page 18]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   when a data segment is sent after a series of sporadic transmissions
   that do not elicit any segments from a remote node.  For example,
   consider a case where a TCP stack has received huge data, while its
   application reads the data very slowly.  In this case, the TCP stack
   would send small window updates once in a while.  During such a
   period, TS.SndMin and TS.SndMax are unchanged, while the SSEG.TSval
   values in those window updates are increasing.  After a while, the
   difference between SSEG.TSval and TS.SndMin could be larger than
   2^31.  That is, (SSEG.TSval - TS.SndMin) could be negative.  If a
   data segment was sent in that case, (TS.SndMax - TS.SndMin) also
   would be negative.  To avoid confusion, the PASA-DF test MUST NOT be
   performed in such cases.  Thus, this memo proposes the following
   procedure:

      - When TS.PASADF_On is true, the PASA-DF test SHOULD be performed.

      - When TS.PASADF_On is true, if (TS.SndMax - TS.SndMin) becomes
        negative after a segment is sent, TS.PASADF_On is set to false.

      - When TS.PASADF_On is false, the PASA-DF test MUST NOT be
        performed.

      - When TS.PASADF_On is false, if a received segment satisfies the
        requirement that (TS.SndMax - RSEG.TSecr) be non-negative,
        RSEG.TSecr is copied to TS.SndMin, and TS.PASADF_On is set to
        true.

   This procedure is incorporated in the following two subsections.

7.1.3 Input Processing

   This subsection describes the procedure when a segment is received.

   When TS.PASADF_On is true, the procedure below is followed.

      - In the CLOSED, LISTEN, and SYN-SENT states, RSEG.TSecr MUST NOT
        be tested.

      - In other states, all segments other than SYN and RST segments
        must satisfy (TS.SndMin - CurRTO <= RSEG.TSecr <= TS.SndMax),
        where CurRTO means the current RTO value.  When a segment other
        than a SYN or RST segment is received, if it does not satisfy
        this inequality, it MUST be dropped, and an ACK segment SHOULD
        be sent in reply.  In addition, when a RST segment with the OTS
        option is received, if it does not satisfy the inequality, it
        MUST be dropped silently.  Otherwise, when a segment other than
        a SYN or RST segment is received, or when a RST segment with the
        OTS option is received, if the received segment satisfies
        (RSEG.TSecr > TS.SndMin), then RSEG.TSecr is copied to


Demizu                   Expires September 2006                [Page 19]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        TS.SndMin.

        Note: The reason why RST segments without the OTS option and SYN
        segments are not tested here is to disconnect half-open TCP
        connections.

   When TS.PASADF_On is false, if a received segment satisfies the
   requirement that (TS.SndMax - RSEG.TSecr) be non-negative, RSEG.TSecr
   is copied to TS.SndMin, and TS.PASADF_On is set to true.

7.1.4 Output Processing

   This subsection describes the procedure when a segment is sent.

   When a segment satisfying (SSEG.LEN > 0) is sent, SSEG.TSval is
   copied to TS.SndMax, and the current time is recorded in
   TS.SndMax_time.  If the segment is the first sent segment (e.g., the
   first SYN segment or the first SYN+ACK segment), SSEG.TSval is also
   copied to TS.SndMin.

   When TS.PASADF_On is false, if (TS.SndMax - TS.SndMin) becomes
   negative after the above copying, TS.PASADF_On is set to false.

7.2 PASA-SR/TS2 (PASA for SYN and RST Segments with TS2)

   This subsection describes a mechanism called PASA-SR/TS2, which
   enables both genuine RST segments without the OTS option and genuine
   SYN segments to trigger disconnection of their TCP connections, while
   spoofed ones are not allowed to trigger such disconnection.

   If a received segment does not satisfy inequality (1) nor (2), it
   MUST NOT update the variables of PASA-SR/TS2.  In contrast, if
   PASA-SR/TS2 is enabled, any received segment SHOULD be tested by
   PASA-SR/TS2 before the acceptability test of [RFC793].

   Note: RST segments without the OTS option and SYN segments are not
   dropped by the base mechanism of TS2, PAWS/TS2, and PASA-DF/TS2 in
   order to disconnect half-open TCP connections.

7.2.1 Procedure

   PASA-SR/TS2 uses the following variables: TS.PASASR_On (boolean) and
   TS.PASASR_time (internal-time).  The initial value of TS.PASASR_On is
   false.  TS.PASASR_time is valid only when TS.PASASR_On is true.

   The procedure is as follows:

      - When a SYN segment is received against an established TCP
        connection, regardless of whether it has the TS option or the


Demizu                   Expires September 2006                [Page 20]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        OTS option, it MUST be dropped, and an ACK segment without
        either the TS option or the OTS option SHOULD be sent in reply.
        The window size of the ACK segment is TS2_PASASR_WIN, which
        should be small enough (e.g., 1 RMSS).  Then, TS.PASASR_On is
        set to true, and the current time is recorded in TS.PASASR_time.

      - When TS.PASASR_On is true, if a received segment is not dropped
        by the base mechanism of TS2, the PAWS test, the PASA-DF test,
        and the acceptability test of [RFC793], then do the following:
        if (1) the received segment is not a RST segment, (2) it is a
        RST segment with the OTS option, or (3) a long time has been
        passed since the last ACK segment was sent in reply to a SYN
        segment (i.e., CurrentTime - TS.PASASR_time >= TS2_PASASR_TIME),
        then TS.PASASR_On is set to false.

      - If a RST segment without the OTS option is received, and if
        TS.PASASR_On is false, or the segment does not satisfy
        (RCV.NXT <= RSEG.SEQ < RCV.NXT + TS2_PASASR_WIN), it MUST be
        dropped silently.

7.2.2 Examples

   If a SYN segment is received against an established TCP connection,
   there are two possible causes.  The first is that the remote node has
   been rebooted or disconnected silently and is trying to establish a
   new TCP connection with the same quadruple by chance.  The second
   cause is that a malicious node sent a spoofed SYN segment with the
   same quadruple by chance.

   If a RST segment without the OTS option is received against an
   established TCP connection, there are two possible causes.  The first
   is that the remote node has been rebooted or disconnected silently
   and has sent a RST segment in reply to a segment sent by the local
   node.  The second cause is that a malicious node sent a spoofed RST
   segment with the same quadruple by chance.

   In any case, genuine SYN and RST segments should cause the TCP
   connection to disconnect, while spoofed SYN and RST segments should
   not cause it to disconnect.

   The following four examples show how PASA-SR/TS2 would work against
   the four possible causes described above.  Suppose that a local node
   and a remote node have an established TCP connection, and TS2 is
   enabled on it.

   Case 1: The remote node has been rebooted or disconnected silently,
           and it sent a SYN segment to establish a new TCP connection
           with the same quadruple by chance.



Demizu                   Expires September 2006                [Page 21]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      When this genuine SYN segment is received by the local node, an
      ACK segment with window size = TS2_PASASR_WIN without either the
      TS option or the OTS option is sent in reply because of
      PASA-SR/TS2.

      When the remote node receives this ACK segment, it sends a RST
      segment without the OTS option because it is in the SYN-SENT
      state.

      Since the sequence number of this RST segment would satisfy
      (RCV.NXT <= RSEG.SEQ < RCV.NXT + TS2_PASASR_WIN), this RST segment
      would be accepted by the local node because of PASA-SR/TS2.  Then,
      this RST segment would disconnect the existing TCP connection
      successfully.

      After a while, another SYN segment will be retransmitted by the
      remote node, and a new TCP connection will be established.

   Case 2: A malicious node sent a spoofed SYN segment with the same
           quadruple by chance.

      When this spoofed SYN segment is received by the local node, an
      ACK segment with window size = TS2_PASASR_WIN without either the
      TS option or the OTS option is sent in reply because of
      PASA-SR/TS2.  When the real remote node receives this ACK segment,
      since TS2 is enabled on the TCP connection, this ACK segment is
      dropped by the remote node, and an ACK segment is sent in reply.

      The ACK segment sent in reply would be accepted by the local node.
      Fortunately, it would have no effect other than that the duplicate
      ACK counter could be falsely increased by one.  Thus, the spoofed
      SYN segment would not disconnect the existing TCP connection.

      Note: If the duplicate ACK counter is increased only by an ACK
      segment with the TS option, it would not be falsely increased in
      this case.  See appendix C.

      In a case where a spoofed RST segment is also sent just behind the
      spoofed SYN segment above, if the spoofed RST segment does not
      carry the OTS option, the possibility of accepting the spoofed RST
      segment is TS2_PASASR_WIN in 2^32.  If the spoofed RST segment
      carries the OTS option, the possibility of accepting the spoofed
      RST segment is equal to the possibility of breaking PASA-DF/TS2.
      Both possibilities would be sufficiently small in most
      environments.

   Case 3: The remote node sent a genuine RST segment without the OTS
           option to disconnect a TCP connection.



Demizu                   Expires September 2006                [Page 22]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      When TS2 is enabled, this case cannot happen, because a legitimate
      RST segment MUST carry the OTS option.

   Case 4: A malicious node sent a spoofed RST segment without the OTS
           option with the same quadruple by chance.

      Such spoofed RST segment is accepted only when TS.PASASR_On is
      true and (RCV.NXT <= RSEG.SEQ < RCV.NXT + TS2_PASASR_WIN) is
      satisfied.  Thus, the possibility of accepting this spoofed RST
      segment is lower than TS2_PASASR_WIN in 2^32.  This would be
      sufficiently small in most environments.








































Demizu                   Expires September 2006                [Page 23]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


8. DLI (Data Loss Inference)

   DLI is a mechanism for inferring losses of original and retransmitted
   data segments by making use of the RSEG.TSecr field in the TS option
   on received segments.  It can be enabled only when TS2 is enabled.
   When a received segment does not satisfy inequality (1) nor (2), or
   it does not carry the TS option, DLI/TS2 (DLI with TS2) MUST NOT be
   performed on the segment.

      Note: When TS1 is enabled, the algorithms described in this
      section might infer losses of original data segments in limited
      scenarios that comprises partial acknowledgements.  Since DLI with
      TS1 would not be able to infer losses of retransmitted data
      segments, however, this memo does not propose to run DLI with TS1.

   DLI/TS2 improves overall throughput by reducing the number of
   retransmission timeouts under heavy loss rates.

8.1 Conceptual Algorithm

   This subsection describes a conceptual algorithm of DLI/TS2 that
   infers losses of any original and retransmitted octets.

   Two variables are associated with each sent octet: OC.SndTS
   (32bit-timestamp) and OC.SndRO (integer).  OC.SndTS holds the
   SSEG.TSval value on the latest sent data segment containing the
   octet.  The initial value of OC.SndRO is zero.  When octets are
   retransmitted, the variables of these octets are reinitialized.  If a
   segment with the TS option is received, every OC.SndRO of every octet
   satisfying (RSEG.TSecr > OC.SndTS) is increased by one.  Then, every
   octet satisfying (OC.SndRO >= TS2_DLI_THRESH) is inferred lost.  A
   real-world implementation would likely prefer to manage the
   retransmitted octets as sequence number ranges.

   DLI/TS2 uses the RSEG.TSecr field in the TS option only, because,
   when out-of-order data exist in the receive buffer, all segments sent
   in reply to data segments carry the TS option, while other segments
   carry the OTS option.  That is, the number of data segments arriving
   at the remote node is equal to the number of segments with the TS
   option departing from the remote node.  To count the number of
   observed possible reorders precisely, any segments with the TS option
   (including data segments, window updates, etc.) SHOULD be counted,
   while any segments with the OTS option (including apparently pure ACK
   segment, etc.) MUST NOT be counted.

   If the granularity of timestamps is coarser than the mean time
   between each data transmission, multiple data segments may carry the
   same SSEG.TSval value, and DLI/TS2 would be less effective.  Some
   ideas to mitigate this problem is shown in appendix B.


Demizu                   Expires September 2006                [Page 24]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   TS2_DLI_THRESH indicates the number of observed possible reorders
   required to infer a loss.  The default value is 3, which is the same
   value as the so-called duplicate acknowledgement threshold specified
   in [RFC2581].  TS2_DLI_THRESH might be implemented as an adaptive
   variable in the future.

8.2 Space-Optimized Algorithms

   The conceptual algorithm, however, would not be easy to implement
   because of memory limitations.  Therefore, this memo proposes the
   following space-optimized algorithms that do not require a huge
   memory space.

      (1) DLI-SEG/TS2 infers losses of any original and retransmitted
          data segments.  It uses two variables for each data segment.

      (2) DLI-SACK/TS2 infers losses of original and retransmitted data
          in SACK holes, which exist between SND.UNA and SND.FACK.  It
          uses two variables for each SACK hole.

      (3) DLI-UNA/TS2 infers losses of original and retransmitted data
          at SND.UNA.  It uses two variables for each TCP connection.

      (4) DLI-NXT/TS2 infers losses of original and retransmitted data
          at SND.NXT minus one.  When (SND.FACK < SND.NXT) is true, it
          infers losses of data between SND.FACK and SND.NXT.  It uses
          two variables for each TCP connection.

      (5) DLI-MAX/TS2 infers losses of original and retransmitted data
          at SND.MAX minus one.  When (SND.NXT < SND.FACK) is true, it
          infers losses of data between SND.FACK and SND.MAX.  It uses
          two variables for each TCP connection.

   These algorithms can be implemented independently.  Since DLI-SEG/TS2
   is sufficiently powerful, however, if it is implemented, other
   algorithms (2)-(5) need not be implemented.

      Note 1: Fast Retransmit [RFC2581] and SACK [RFC2018][RFC3517] are
      helpful for inferring losses of original data segments, while they
      cannot infer losses of retransmitted data segments in contrast to
      DLI/TS2.  They are helpful, however, if DLI-UNA/TS2 is implemented
      but DLI-SEG/TS2 and DLI-SACK/TS2 are not implemented.
      See appendix C.2 for more details.

      Note 2: If some data have been retransmitted, losses of data
      between SND.UNA and the highest retransmitted sequence number
      cannot be inferred using IsLost() [RFC3517], while such losses can
      be inferred by DLI/TS2.  See appendix C.2 for more details.



Demizu                   Expires September 2006                [Page 25]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


8.2.1 DLI-SEG/TS2 (DLI for Data Segments with TS2)

   This subsection describes an algorithm called DLI-SEG/TS2 that infers
   losses of any original and retransmitted data segments.  It is useful
   when each data segment is already managed by an internal data
   structure and is not repacketized.

   DLI-SEG/TS2 uses two variables for each sent or retransmitted data
   segment: DS.SndTS (32bit-timestamp) and DS.SndRO (integer).  DS.SndTS
   holds the SSEG.TSval value on the latest sent data segment.  DS.SndRO
   counts the number of received segments with the TS option satisfying
   (RSEG.TSecr > DS.SndTS).  In other words, it counts the number of
   observed possible reorders from the point of the view of the data
   segment.

   The procedure is as follows.

   When a data segment is sent or retransmitted, its SSEG.TSval value is
   recorded in DS.SndTS, and DS.SndRO is cleared.

   When a segment with the TS option is received, every DS.SndRO of
   every data segment satisfying (RSEG.TSecr > DS.SndTS) is increased by
   one.  Then, all data segments satisfying (DS.SndRO >= TS2_DLI_THRESH)
   are inferred lost.

      Implementation Hint: Prepare a chain of structures for data
      segments, sorted by DS.SndTS.  When a new data segment is sent, a
      new structure for the data segment is allocated, SSEG.TSval is
      copied to DS.SndTS, and DS.SndRO is cleared; then the structure is
      inserted at the tail of the chain.  When a data segment is
      retransmitted, SSEG.TSval is copied to DS.SndTS, and DS.SndRO is
      cleared; then the structure is moved to the tail of the chain.
      When a segment with the TS option is received, traverse the chain
      from the head while (RSEG.TSecr > DS.SndTS).  For each structure
      satisfying this inequality, DS.SndRO is increased by one.  Then,
      every structure satisfying (DS.SndRO >= TS2_DLI_THRESH) is
      inferred lost.

8.2.2 DLI-SACK/TS2 (DLI for Data in SACK Holes with TS2)

   This subsection describes an algorithm called DLI-SACK/TS2 that
   infers losses of original and retransmitted data in SACK holes, which
   exist between SND.UNA and SND.FACK.  It can be enabled when SACK is
   enabled.

   DLI-SACK/TS2 uses two variables for each SACK hole: SH.SndTS
   (32bit-timestamp) and SH.SndRO (integer).  SH.SndTS holds the
   SSEG.TSval value on the latest sent data segment containing data in
   the SACK hole.  SH.SndRO counts the number of received segments with


Demizu                   Expires September 2006                [Page 26]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   the TS option satisfying (RSEG.TSecr > SH.SndTS).  In other words, it
   counts the number of observed possible reorders from the point of the
   view of the data in the SACK hole since part or all of the data in
   the SACK hole were sent or retransmitted last time.

   The procedure is as follows.

   When a segment with the TCP SACK option is received, if a new SACK
   hole is created, then a new data structure is allocated, the received
   RSEG.TSecr value is copied to SH.SndTS, and SH.SndRO is cleared.

      Note: The SSEG.TSval values on the un-SACKed data segments in the
      new SACK hole likely would be no greater than the received
      RSEG.TSecr value.  Thus, it would be safe to use the received
      RSEG.TSecr value for the initial value of SH.SndTS here.

   If an existing SACK hole is split by a received SACK block, SH.SndTS
   and updated SH.SndRO are inherited to the split SACK holes.  If an
   existing SACK hole is shrunken or expanded, SH.SndTS and SH.SndRO are
   unchanged.

   When part or all of data in a SACK hole is retransmitted, the
   SSEG.TSval value on the data segment is copied to SH.SndTS, and
   SH.SndRO is cleared.

   When a segment with the TS option is received, SH.SndRO of every SACK
   hole satisfying (RSEG.TSecr > SH.SndTS) is increased by one.  Then,
   whole data in all SACK holes satisfying (SH.SndRO >= TS2_DLI_THRESH)
   are inferred lost.

      Implementation Hint: Prepare a chain of SACK holes, sorted by
      SH.SndTS.  When a new SACK hole is created, RSEG.TSecr is copied
      to SH.SndTS, SH.SndRO is cleared, and the SACK hole is inserted at
      the tail of the chain.  When data in a SACK hole is retransmitted,
      SSEG.TSval is copied to SH.SndTS, and SH.SndRO is cleared; then
      the SACK hole is moved to the tail of the chain.  When a segment
      with the TS option is received, traverse the chain from the head
      while (RSEG.TSecr > SH.SndTS).  For each SACK hole satisfying this
      inequality, SH.SndRO is increased by one.  Then, every SACK hole
      satisfying (SH.SndRO >= TS2_DLI_THRESH) is inferred lost.  If a
      SACK hole is split by a received SACK block, the split SACK holes
      inherit SH.SndTS, SH.SndRO, and the position in the chain.

8.2.3 DLI-UNA/TS2 (DLI for Data at SND.UNA with TS2)

   This subsection describes an algorithm called DLI-UNA/TS2 that infers
   losses of original and retransmitted data at SND.UNA.  It works only
   when (SND.UNA < SND.MAX) is true.



Demizu                   Expires September 2006                [Page 27]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   DLI-UNA/TS2 uses two variables: TS.UNA.SndTS (32bit-timestamp) and
   TS.UNA.SndRO (integer).  TS.UNA.SndTS holds the SSEG.TSval value on
   the latest sent data segment containing data at SND.UNA.
   TS.UNA.SndRO counts the number of received segments with the TS
   option satisfying (RSEG.TSecr > TS.UNA.SndTS).  In other words, it
   counts the number of observed possible reorders from the point of the
   view of data at SND.UNA since last sent.  The variables have valid
   values only when (SND.UNA < SND.MAX) is true.

   The procedure is as follows.

   When data at SND.UNA is sent or retransmitted, the SSEG.TSval value
   on the data segment is recorded in TS.UNA.SndTS, and TS.UNA.SndRO is
   cleared.

   When (SND.UNA < SND.MAX) is true, if a received segment does not
   advance SND.UNA (i.e., RSEG.ACK <= SND.UNA), and if it carries the TS
   option and satisfies (RSEG.TSecr > TS.UNA.SndTS), then TS.UNA.SndRO
   is increased by one.  After that, if (TS.UNA.SndRO >= TS2_DLI_THRESH)
   is true, data at SND.UNA is inferred lost.  Otherwise, if a received
   segment advances SND.UNA (i.e., old SND.UNA < RSEG.ACK <= SND.MAX),
   the current external timestamp value is copied to TS.UNA.SndTS, and
   TS.UNA.SndRO is cleared.

      Implementation Note: If PASA-DF/TS2 is enabled, then when SND.UNA
      is advanced, for time-optimization TS.SndMax can be copied to
      TS.UNA.SndTS, instead of the current external timestamp value.

   If DLI-SACK/TS2 is enabled, when SND.UNA is advanced by a received
   segment (i.e., old SND.UNA < RSEG.ACK <= SND.MAX), and if the new
   SND.UNA is in an existing SACK hole, SH.SndTS and SH.SndRO of the
   SACK hole are copied to TS.UNA.SndTS and TS.UNA.SndRO, respectively.

8.2.4 DLI-NXT/TS2 (DLI for Data at SND.NXT minus one with TS2)

   This subsection describes an algorithm called DLI-NXT/TS2 that infers
   losses of data at SND.NXT-1.  It works only when (SND.UNA < SND.NXT)
   is true.  When (SND.FACK < SND.NXT) is true, it infers losses of
   original and retransmitted data between SND.FACK and SND.NXT.

   DLI-NXT/TS2 uses two variables: TS.NXT.SndTS (32bit-timestamp) and
   TS.NXT.SndRO (integer).  TS.NXT.SndTS holds the SSEG.TSval value on
   the latest sent data segment containing the data at SND.NXT-1.
   TS.NXT.SndRO counts the number of received segments with the TS
   option satisfying (RSEG.TSecr > TS.NXT.SndTS).  In other words, it
   counts the number of observed possible reorders from the point of the
   view of the data at SND.NXT-1 since last sent.  The variables have
   valid values when the data at SND.NXT-1 have not been acknowledged
   nor SACKed.


Demizu                   Expires September 2006                [Page 28]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   The procedure is as follows.

   When the data at SND.NXT is sent, or when (SND.FACK < SND.NXT) is
   true and part or all of data between SND.FACK and SND.NXT is
   retransmitted, if previously received window size is not zero (i.e.,
   TCP persist timer is off), then the SSEG.TSval value on the data
   segment is recorded in TS.NXT.SndTS, and TS.NXT.SndRO is cleared.

   When the data at SND.NXT-1 have not been acknowledged nor SACKed, if
   a segment with the TS option satisfying (RSEG.TSecr > TS.NXT.SndTS)
   is received, then TS.NXT.SndRO is increased by one.  After that, if
   (TS.NXT.SndRO >= TS2_DLI_THRESH), the data at SND.NXT-1 is inferred
   lost.  In this case, if (SND.FACK < SND.NXT) is also true, the data
   between SND.FACK and SND.NXT is inferred lost.

8.2.5 DLI-MAX/TS2 (DLI for Data at SND.MAX minus one with TS2)

   This subsection describes an algorithm called DLI-MAX/TS2 that infers
   losses of data at SND.MAX-1.  It works only when (SND.UNA < SND.MAX)
   is true.  When (SND.NXT < SND.FACK) is true, it infers losses of
   original and retransmitted data between SND.FACK and SND.MAX.

   DLI-MAX/TS2 uses two variables: TS.MAX.SndTS (32bit-timestamp) and
   TS.MAX.SndRO (integer).  TS.MAX.SndTS holds the SSEG.TSval value on
   the latest sent data segment containing the data at SND.MAX-1.
   TS.MAX.SndRO counts the number of received segments with the TS
   option satisfying (RSEG.TSecr > TS.MAX.SndTS).  In other words, it
   counts the number of observed possible reorders from the point of the
   view of the data at SND.MAX-1 since last sent.  The variables have
   valid values when the data at SND.MAX-1 have not been acknowledged
   nor SACKed.

   The procedure is as follows.

   When the data at SND.MAX is sent, or when (SND.NXT < SND.FACK) is
   true and part or all of data between SND.FACK and SND.MAX is
   retransmitted, if previously received window size is not zero (i.e.,
   TCP persist timer is off), then the SSEG.TSval value on the data
   segment is recorded in TS.MAX.SndTS, and TS.MAX.SndRO is cleared.

   When the data at SND.MAX-1 have not been acknowledged nor SACKed, if
   a segment with the TS option satisfying (RSEG.TSecr > TS.MAX.SndTS)
   is received, then TS.MAX.SndRO is increased by one.  After that, if
   (TS.MAX.SndRO >= TS2_DLI_THRESH), the data at SND.MAX-1 is inferred
   lost.  In this case, if (SND.NXT < SND.FACK) is also true, the data
   between SND.FACK and SND.MAX is inferred lost.





Demizu                   Expires September 2006                [Page 29]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


9. SLID (Spurious Loss Inference Detection)

   SLID is a mechanism for detecting spurious loss inference by making
   use of the received RSEG.TSecr value in the TS option and the OTS
   option.  SLID can be enabled when TS2 is enabled.  When a received
   segment does not satisfy inequality (1) nor (2), or it does not carry
   the TS option nor the OTS option, then SLID/TS2 (SLID with TS2) MUST
   NOT be performed on the segment.

   To decide whether a loss inference is genuine or spurious, when a
   loss is inferred, SLID/TS2 first sends an arbitrary in-window data
   segment as a probe, then examines incoming segments until a decision
   is made.  Probe data segment may not contain the data being probed.
   Therefore, it may consist of unsent data, decidedly lost data, or
   inferredly lost data.

   SLID/TS2 makes it possible to postpone the retransmission of data
   that has been inferred lost until the loss inference is decided to be
   genuine, in order to avoid wasting bandwidth and transmission battery
   power on unnecessary retransmissions.  It also makes it possible to
   postpone the reduction of congestion window until the loss inference
   is decided to be genuine.  In addition, SLID/TS2 can be applied to
   detect a posteriori spurious retransmissions in order to alleviate
   unnecessary data retransmissions and duplicate acknowledgements.
   Response algorithms are outside the scope of this memo.

9.1 Conceptual Algorithm

   This subsection describes a conceptual algorithm of SLID/TS2 that
   detects spurious loss inference of any octet.

   Two variables are associated with each sent octet: OC.TgtTS
   (32bit-timestamp) and OC.PrbTS (32bit-timestamp).  OC.TgtTS (target
   timestamp) holds the latest SSEG.TSval value on the original or
   retransmitted decidedly-lost segments containing the octet.  OC.PrbTS
   (probe timestamp) holds the SSEG.TSval value on the first data
   segment that is sent since the octet has been inferred lost.  This
   sent segment is called "probe segment" in this memo.  It may not
   contain the octet being probed.

   After sending a probe segment, every incoming segment is examined
   until all necessary decisions are made for every octet that is
   inferred lost.  More specifically, the RSEG.TSecr values of every
   incoming segment is compared with the "border timestamp" of each
   octet that has been inferred lost in order to decide whether each
   loss inference is genuine or spurious.

   The border timestamp is calculated as follows:



Demizu                   Expires September 2006                [Page 30]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      TS2_SLID_BTS(TgtTS, PrbTS) = (k * TgtTS + (1 - k) * PrbTS),

      where k = 1/2.

   If a received RSEG.TSecr value is greater than the border timestamps
   of some octets, the loss inferences of those octets are decided to be
   genuine.  Otherwise, if a received segment acknowledges or SACKs some
   octets, the loss inferences of those octets are decided to be
   spurious.  In other cases, another segment is awaited.  The reason
   why received RSEG.TSecr values are compared with the border timestamp
   of each octet instead of OC.TgtTS is shown in appendix G.4.

   The precise procedure is as follows.

   To reset the target timestamp, when an octet is first sent, or when a
   decidedly lost octet is retransmitted, the SSEG.TSval value on the
   sent data segment is recorded in both OC.TgtTS and OC.PrbTS.

   To set the probe timestamp, when any data segment is sent, the
   SSEG.TSval value on the segment is copied to OC.PrbTS of all octets
   that have been inferred lost and satisfy (OC.TgtTS == OC.PrbTS).
   The sent data segment may carry these octets.

   To make a decision, when a segment with the TS option or the OTS
   option is received, for any octet satisfying (OC.TgtTS < OC.PrbTS),
   if (RSEG.TSecr > TS2_SLID_BTS(OC.TgtTS, OC.PrbTS)) is true, then its
   loss inference is decided to be genuine.  Otherwise, if such an octet
   is acknowledged or SACKed by the segment, then its loss inference is
   decided to be spurious.  In other cases, another segment is awaited.

   To avoid incorrect decision due to underflow, when any segment is
   sent, for any octet that is inferred lost, if (SSEG.TSval - OC.TgtTS)
   is negative, then its loss inference is decided to be spurious.

   When a decision is made, an appropriate response algorithm such as
   [RFC4015] is executed, then OC.PrbTS is copied to OC.TgtTS to
   terminate this procedure.

   Note: (OC.TgtTS < OC.PrbTS) is true only when the octet has been
   inferred lost and has been probed.  After a decision is made,
   (OC.TgtTS == OC.PrbTS) becomes true.

   A real-world implementation would likely prefer to manage the octets
   as sequence number ranges.

9.2 Space-Optimized Algorithms

   The conceptual algorithm, however, would not be easy to implement
   because of memory limitations.  Therefore, this memo proposes the


Demizu                   Expires September 2006                [Page 31]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   following space-optimized algorithms that do not require a huge
   memory space.

      (1) SLID-SEG/TS2 detects spurious loss inference of any data
          segments.  It uses two variables for each data segment.

      (2) SLID-SACK/TS2 detects spurious loss inference of data in SACK
          holes, which exist between SND.UNA and SND.FACK.  It uses two
          variables for each SACK hole.

      (3) SLID-UNA/TS2 detects spurious loss inference of data at
          SND.UNA.  It uses two variables for each TCP connection.

      (4) SLID-NXT/TS2 detects spurious loss inference of data at
          SND.NXT minus one.  When (SND.FACK < SND.NXT) is true, it
          detects spurious loss inference of data between SND.FACK and
          SND.NXT.  It uses two variables for each TCP connection.

      (5) SLID-MAX/TS2 detects spurious loss inference of data at
          SND.MAX minus one.  When (SND.NXT < SND.FACK) is true, it
          detects spurious loss inference of data between SND.FACK and
          SND.MAX.  It uses two variables for each TCP connection.

   These algorithms can be implemented independently.  Since
   SLID-SEG/TS2 is sufficiently powerful, however, if it is implemented,
   other algorithms (2)-(5) need not be implemented.

9.2.1 SLID-SEG/TS2 (SLID for Data Segments with TS2)

   This subsection describes an algorithm called SLID-SEG/TS2 that
   detects spurious loss inference of any sent data segments.  It is
   useful when each data segment is already managed by an internal data
   structure and is not repacketized.

   SLID-SEG/TS2 uses two variables for each sent or retransmitted data
   segment: DS.TgtTS (32bit-timestamp) and DS.PrbTS (32bit-timestamp).
   DS.TgtTS (target timestamp) holds the latest SSEG.TSval value on the
   original or retransmitted decidedly-lost data segments.  DS.PrbTS
   (probe timestamp) holds the SSEG.TSval value on the first data
   segment that is sent since the data segment has been inferred lost.
   The sent segment may be different from the data segment being probed.

   The procedure is as follows.

   To reset the target timestamp, when a data segment is first sent, or
   when a decidedly lost data segment is retransmitted, the SSEG.TSval
   value on the sent data segment is recorded in both DS.TgtTS and
   DS.PrbTS.



Demizu                   Expires September 2006                [Page 32]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   To set the probe timestamp, when any data segment is sent, the
   SSEG.TSval value on the segment is copied to DS.PrbTS of all data
   segment structures that have been inferred lost, and satisfy
   (DS.TgtTS == DS.PrbTS).  The sent data segment may be the same as one
   of these data structures.

   To make a decision, when a segment with the TS option or the OTS
   option is received, for any data segment structure satisfying
   (DS.TgtTS < DS.PrbTS), if the received RSEG.TSecr value is greater
   than TS2_SLID_BTS(DS.TgtTS, DS.PrbTS), then its loss inference is
   decided to be genuine.  Otherwise, if such data segment structure is
   acknowledged or SACKed by the segment, then its loss inference is
   decided to be spurious.  In other cases, another segment is awaited.

   To avoid incorrect decision due to underflow, when any segment is
   sent, for any data segment structure that is inferred lost, if
   (SSEG.TSval - DS.TgtTS) is negative, then its loss inference is
   decided to be spurious.

   When a decision is made, an appropriate response algorithm is
   executed, then DS.PrbTS is copied to DS.TgtTS to terminate this
   procedure.

   Note: (DS.TgtTS < DS.PrbTS) is true only when the data segment has
   been inferred lost and has been probed.  After a decision is made,
   (DS.TgtTS == DS.PrbTS) becomes true.

   The implementation hint for DLI-SEG/TS2 might be of help in
   implementing SLID-SEG/TS2.

9.2.2 SLID-SACK/TS2 (SLID for Data in SACK Holes with TS2)

   This subsection describes an algorithm called SLID-SACK/TS2 that
   detects spurious loss inference of data segments containing data in
   SACK holes, which exist between SND.UNA and SND.FACK.  The
   implementation hint for DLI-SACK/TS2 might be of help in implementing
   SLID-SACK/TS2.

   SLID-SACK/TS2 uses two variables for each SACK hole: SH.TgtTS
   (32bit-timestamp) and SH.PrbTS (32bit-timestamp).  SH.TgtTS (target
   timestamp) holds the RSEG.TSecr value on the segment which created
   the SACK hole.  SH.PrbTS (probe timestamp) holds the SSEG.TSval value
   on the first data segment that is sent since the data in the SACK
   hole has been inferred lost.  The sent segment may not carry the data
   being probed.

   The procedure is as follows.

   To reset the target timestamp, when a segment with the TS option or


Demizu                   Expires September 2006                [Page 33]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   the OTS option is received, if a SACK hole is created by the SACK
   blocks on the segment, its RSEG.TSecr value is recorded in both
   SH.TgtTS and SH.PrbTS of the created SACK hole.  If a SACK hole is
   expanded by the SACK blocks on the segment, the RSEG.TSecr value is
   recorded in both SH.TgtTS and SH.PrbTS of the expanded SACK hole.

      Note: The SSEG.TSval values on the lost data segments in a new
      SACK hole likely would be less than the received RSEG.TSecr value.
      Therefore, it would be safe to use the RSEG.TSecr value of the
      received segment which created the SACK hole as initial values of
      SH.TgtTS and SH.PrbTS.

   To set the probe timestamp, when any data segment is sent, the
   SSEG.TSval value on the segment is copied to SH.PrbTS of all SACK
   holes that are inferred lost and satisfy (SH.TgtTS == SH.PrbTS).
   The sent data segment may carry data in one of these SACK holes.

   To make a decision, when a segment with the TS option or the OTS
   option is received, for any SACK hole with (SH.TgtTS < SH.PrbTS), if
   (RSEG.TSecr > TS2_SLID_BTS(SH.TgtTS, SH.PrbTS)) is true, then its
   loss inference is decided to be genuine.  Otherwise, if part or all
   of data in such SACK hole is acknowledged or SACKed by the segment,
   then its loss inference is decided to be spurious.  In other cases,
   another segment is awaited.

   To avoid incorrect decision due to underflow, when any segment is
   sent, for any SACK hole that is inferred lost, if
   (SSEG.TSval - SH.TgtTS) is negative, then its loss inference is
   decided to be spurious.

   When a decision is made, an appropriate response algorithm is
   executed, then SH.PrbTS is copied to SH.TgtTS to terminate this
   procedure.

   Note: (SH.TgtTS < SH.PrbTS) is true only when the data in the SACK
   hole has been inferred lost and has been probed.  After a decision is
   made, (SH.TgtTS == SH.PrbTS) becomes true.

9.2.3 SLID-UNA/TS2 (SLID for Data at SND.UNA with TS2)

   This subsection describes an algorithm called SLID-UNA/TS2 that
   detects spurious loss inference of data segment containing data at
   SND.UNA.  It works only when (SND.UNA < SND.MAX) is true.

   SLID-UNA/TS2 uses two variables: TS.UNA.TgtTS (32bit-timestamp) and
   TS.UNA.PrbTS (32bit-timestamp).  TS.UNA.TgtTS (target timestamp)
   holds the closest subsequent timestamp value to the latest SSEG.TSval
   value on original or retransmitted decidedly-lost data segments
   containing data at SND.UNA.  TS.UNA.PrbTS (probe timestamp) holds the


Demizu                   Expires September 2006                [Page 34]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   SSEG.TSval value on the first data segment that is sent since the
   data at SND.UNA has been inferred lost.  The sent segment may not
   contain the data at SND.UNA.  The initial values of both TS.UNA.TgtTS
   and TS.UNA.PrbTS are equal to the SSEG.TSval value on the first sent
   SYN segment.  The variables have valid values if (SND.UNA < SND.MAX)
   is true.

   The procedure is as follows.

   To reset the target timestamp, when an original or decidedly lost
   data segment containing data at SND.UNA is sent, the SSEG.TSval value
   is recorded in both TS.UNA.TgtTS and TS.UNA.PrbTS.  In addition, when
   a received segment with the TS option or the OTS option advances
   SND.UNA (i.e., old SND.UNA < RSEG.ACK <= SND.MAX), RSEG.TSecr is
   copied to TS.UNA.PrbTS if (RSEG.TSecr > TS.UNA.PrbTS) is true, then
   TS.UNA.PrbTS is copied to TS.UNA.TgtTS.

   To set the probe timestamp, when any data segment is sent, if data at
   SND.UNA is inferred lost and (TS.UNA.TgtTS == TS.UNA.PrbTS) is true,
   then the SSEG.TSval value on the sent data segment is recorded in
   TS.UNA.PrbTS.  The sent data segment may carry data at SND.UNA.

   To make a decision, when a segment with the TS option or the OTS
   option is received, if (TS.UNA.TgtTS < TS.UNA.PrbTS) is true, and
   (RSEG.TSecr > TS2_SLID_BTS(TS.UNA.TgtTS, TS.UNA.PrbTS)) is true, then
   the loss inference is decided to be genuine.  Otherwise, if SND.UNA
   is advanced by the segment (i.e., old SND.UNA < RSEG.ACK <= SND.MAX),
   the loss inference is decided to be spurious.  In other cases,
   another segment is awaited.

   To avoid incorrect decision due to underflow, when any segment is
   sent, if the data at SND.UNA is inferred lost, and if
   (SSEG.TSval - TS.UNA.TgtTS) is negative, then the loss inference is
   decided to be spurious.

   When a decision is made, an appropriate response algorithm is
   executed.  Then, if (RSEG.TSecr > TS.UNA.PrbTS) is satisfied,
   RSEG.TSecr is copied to TS.UNA.PrbTS.  After that, TS.UNA.PrbTS is
   copied to TS.UNA.TgtTS to terminate this procedure.

   Note: (TS.UNA.TgtTS < TS.UNA.PrbTS) is true only when data at SND.UNA
   has been inferred lost and has been probed.  After a decision is
   made, (TS.UNA.TgtTS == TS.UNA.PrbTS) becomes true.

   If SLID-SACK/TS2 is enabled, when SND.UNA is advanced by a received
   segment (i.e., old SND.UNA < RSEG.ACK <= SND.MAX), if the new SND.UNA
   is in an existing SACK hole, SH.TgtTS and SH.PrbTS of the SACK hole
   are copied to TS.UNA.TgtTS and TS.UNA.PrbTS, respectively.



Demizu                   Expires September 2006                [Page 35]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


9.2.4 SLID-NXT/TS2 (SLID for Data at SND.NXT minus one with TS2)

   This subsection describes an algorithm called SLID-NXT/TS2 that
   detects spurious loss inference of data segment containing data at
   SND.NXT-1.  It works only when (SND.UNA < SND.NXT) is true.  When
   (SND.FACK < SND.NXT) is true, it detects spurious loss inference of
   data segment containing data between SND.FACK and SND.NXT.

   SLID-NXT/TS2 uses two variables: TS.NXT.TgtTS (32bit-timestamp) and
   TS.NXT.PrbTS (32bit-timestamp).  TS.NXT.TgtTS (target timestamp)
   holds the latest SSEG.TSval value on the original or retransmitted
   decidedly-lost data segments containing data at SND.NXT-1.
   TS.NXT.PrbTS (probe timestamp) holds the SSEG.TSval value on the
   first data segment that is sent since the data at SND.NXT-1 has been
   inferred lost.  The sent segment may not contain the data at
   SND.NXT-1.  The initial values of both TS.NXT.TgtTS and TS.NXT.PrbTS
   are equal to the SSEG.TSval value on the first sent SYN segment.  The
   variables have valid values if (SND.UNA < SND.NXT) is true.

   The procedure is as follows.

   To reset the target timestamp, when data at SND.NXT is sent, the
   SSEG.TSval value on the data segment is recorded in both TS.NXT.TgtTS
   and TS.NXT.PrbTS.  Note that SND.NXT is increased by SSEG.LEN after
   the data segment is sent.

   To set the probe timestamp, when any data segment is sent, if it does
   not contain data at SND.NXT, data at SND.NXT-1 is inferred lost,
   (TS.NXT.TgtTS == TS.NXT.PrbTS) is true, and previously received
   window size is not zero (i.e., TCP persist timer is off), then the
   SSEG.TSval value on the sent data segment is recorded in
   TS.NXT.PrbTS.  The sent data segment may carry data at SND.NXT-1.

   To make a decision, when a segment with the TS option or the OTS
   option is received, if (TS.NXT.TgtTS < TS.NXT.PrbTS) is true, and
   (RSEG.TSecr > TS2_SLID_BTS(TS.NXT.TgtTS, TS.NXT.PrbTS)) is true, then
   the loss inference is decided to be genuine.  Otherwise, if the data
   is acknowledged or SACKed by the segment, then the loss inference is
   decided to be spurious.  In other cases, another segment is awaited.

   To avoid incorrect decision due to underflow, when any segment is
   sent, if the data at SND.NXT-1 is inferred lost, and if
   (SSEG.TSval - TS.NXT.TgtTS) is negative, then the loss inference is
   decided to be spurious.

   When a decision is made, an appropriate response algorithm is
   executed, and TS.NXT.PrbTS is copied to TS.NXT.TgtTS to terminate
   this procedure.  When (SND.FACK < SND.NXT) is true, the decision is
   applied to all data segments containing data between SND.FACK and


Demizu                   Expires September 2006                [Page 36]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   SND.NXT.

   Note: (TS.NXT.TgtTS < TS.NXT.PrbTS) is true only when data at
   SND.NXT-1 has been inferred lost and has been probed.  After a
   decision is made, (TS.NXT.TgtTS == TS.NXT.PrbTS) becomes true.

9.2.5 SLID-MAX/TS2 (SLID for Data at SND.MAX minus one with TS2)

   This subsection describes an algorithm called SLID-MAX/TS2 that
   detects spurious loss inference of data segment containing data at
   SND.MAX-1.  It works only when (SND.UNA < SND.MAX) is true.  When
   (SND.NXT < SND.FACK) is true, it detects spurious loss inference of
   data segment containing data between SND.FACK and SND.MAX.

   SLID-MAX/TS2 uses two variables: TS.MAX.TgtTS (32bit-timestamp) and
   TS.MAX.PrbTS (32bit-timestamp).  TS.MAX.TgtTS (target timestamp)
   holds the latest SSEG.TSval value on the original or retransmitted
   decidedly-lost data segments containing data at SND.MAX-1.
   TS.MAX.PrbTS (probe timestamp) holds the SSEG.TSval value on the
   first data segment that is sent since the data at SND.MAX-1 has been
   inferred lost.  The sent segment may not contain the data at
   SND.MAX-1.  The initial values of both TS.MAX.TgtTS and TS.MAX.PrbTS
   are equal to the SSEG.TSval value on the first sent SYN segment.  The
   variables have valid values if (SND.UNA < SND.MAX) is true.

   The procedure is as follows.

   To reset the target timestamp, when data at SND.MAX is sent, the
   SSEG.TSval value on the data segment is recorded in both TS.MAX.TgtTS
   and TS.MAX.PrbTS.  Note that SND.MAX is increased by SSEG.LEN after
   the data segment is sent.

   To set the probe timestamp, when any data segment is sent, if it does
   not contain data at SND.MAX, data at SND.MAX-1 is inferred lost,
   (TS.MAX.TgtTS == TS.MAX.PrbTS) is true, and previously received
   window size is not zero (i.e., TCP persist timer is off), then the
   SSEG.TSval value on the sent data segment is recorded in
   TS.MAX.PrbTS.  The sent data segment may carry data at SND.MAX-1.

   To make a decision, when a segment with the TS option or the OTS
   option is received, if (TS.MAX.TgtTS < TS.MAX.PrbTS) is true, and
   (RSEG.TSecr > TS2_SLID_BTS(TS.MAX.TgtTS, TS.MAX.PrbTS)) is true, then
   the loss inference is decided to be genuine.  Otherwise, if the data
   is acknowledged or SACKed by the segment, then the loss inference is
   decided to be spurious.  In other cases, another segment is awaited.

   To avoid incorrect decision due to underflow, when any segment is
   sent, if the data at SND.MAX-1 is inferred lost, and if
   (SSEG.TSval - TS.MAX.TgtTS) is negative, then the loss inference is


Demizu                   Expires September 2006                [Page 37]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   decided to be spurious.

   When a decision is made, an appropriate response algorithm is
   executed, and TS.MAX.PrbTS is copied to TS.MAX.TgtTS to terminate
   this procedure.  When (SND.NXT < SND.FACK) is true, the decision is
   applied to all data segments containing data between SND.FACK and
   SND.MAX.

   Note: (TS.MAX.TgtTS < TS.MAX.PrbTS) is true only when data at
   SND.MAX-1 has been inferred lost and has been probed.  After a
   decision is made, (TS.MAX.TgtTS == TS.MAX.PrbTS) becomes true.








































Demizu                   Expires September 2006                [Page 38]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


10. Security Considerations

   PASA/TS2 is a lightweight protection mechanism against spoofing
   attacks injecting faked SYN, data, FIN, and RST segments.

   The vulnerability described in [CVE05] and [CERT05] is mitigated in
   PAWS/TS1 and PAWS/TS2 because of inequalities (1) and (2).  When TS2
   is enabled, it is also mitigated by PASA/TS2.


11. IANA Considerations

   The option-kind value of the TCP Old Timestamps option needs to be
   assigned.


12. Acknowledgements

   The TCP Timestamps option was originally specified in [RFC1323] by
   Van Jacobson, Bob Braden, and Dave Borman.  Many ideas in this memo
   are thus inherited from it.

   The TS.Recent update rule of TS2 was inspired by Reiner Ludwig
   [Lud03a][Lud03b].  The idea of detecting spurious loss inference by
   making use of the TSecr field was inspired by the Eifel Detection
   Algorithm [RFC3522] proposed by Reiner Ludwig.

   The idea of detecting spoofed segments by making use of the TSecr
   field was proposed by Kacheong Poon.  He has given the author
   invaluable insights, ideas, and comments on timestamp handling
   through discussions on [PD04], etc.


13. References

13.1 Normative References

   [RFC793]   J. Postel, "Transmission Control Protocol", STD7, RFC793,
              September 1981.

   [RFC1323]  V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for
              High Performance", RFC1323, May 1992.

   [RFC2119]  S. Bradner, "Key words for use in RFCs to Indicate
              Requirement Levels", BCP14, RFC2119, March 1997.

   [RFC2581]  M. Allman, V. Paxson, and W. Stevens, "TCP Congestion
              Control", RFC2581, April 1999.



Demizu                   Expires September 2006                [Page 39]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   [RFC2988]  V. Paxson and M. Allman, "Computing TCP's Retransmission
              Timer", RFC2988, November 2000.

   [RFC3522]  R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for
              TCP", RFC3522, April 2003.

13.2 Informative References

   [RFC1122]  R. Braden, Editor, "Requirements for Internet Hosts -
              Communication Layers", RFC1122, October 1989.

   [RFC2018]  M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow, "TCP
              Selective Acknowledgement Options", RFC2018, October
              1996.

   [RFC2385]  A. Heffernan, "Protection of BGP Sessions via the TCP MD5
              Signature Option", RFC2385, August 1998.

   [RFC2883]  S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky, "An
              Extension to the Selective Acknowledgement (SACK) Option
              for TCP", RFC2883, July 2000.

   [RFC3042]  M. Allman, H. Balakrishnan, and S. Floyd, "Enhancing TCP's
              Loss Recovery Using Limited Transmit", RFC3042, January
              2001.

   [RFC3465]  M. Allman, "TCP Congestion Control with Appropriate Byte
              Counting (ABC)", RFC3465, February 2003.

   [RFC3517]  E. Blanton, M. Allman, K. Fall, and L. Wang, "A
              Conservative Selective Acknowledgment (SACK)-based Loss
              Recovery Algorithm for TCP", RFC3517, April 2003.

   [RFC3708]  E. Blanton and M. Allman, "Using TCP Duplicate Selective
              Acknowledgement (DSACKs) and Stream Control Transmission
              Protocol (SCTP) Duplicate Transmission Sequence Numbers
              (TSNs) to Detect Spurious Retransmissions", RFC3708,
              February 2004.

   [RFC3782]  S. Floyd, T. Henderson, and A. Gurtov, "The NewReno
              Modification to TCP's Fast Recovery Algorithm", RFC3582,
              April 2004.

   [RFC4015]  R. Ludwig and A. Gurtov, "The Eifel Response Algorithm for
              TCP", RFC4015, February 2005.

   [All04]    M. Allman, "Re: [tcpm] long options draft revision",
              the IETF TCPM WG mailing list, September 2004.  URL
              "http://www1.ietf.org/mail-archive/web/tcpm/current/


Demizu                   Expires September 2006                [Page 40]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


              msg00748.html"

   [Bra93]    R. Braden, "TCP Extensions for High Performance: An
              Update", (work in progress), Internet-Draft, June 1993.
              URL "http://www.kohala.com/start/tcplw-extensions.txt"

   [CERT05]   US-CERT, "Vulnerability Note VU#637934", May 2005.
              URL "http://www.kb.cert.org/vuls/id/637934"

   [CVE05]    CVE-2005-0356, May 2005.  URL "http://www.cve.mitre.org/
              cgi-bin/cvename.cgi?name=2005-0356"

   [Duk03a]   M. Duke, "[Tsvwg] Updating timestamps (ts_recent) in
              Linux", the IETF TSVWG WG mailing list, August 2003.  URL
              "http://www1.ietf.org/mail-archive/web/tsvwg/current/
              msg04379.html"

   [Duk03b]   M. Duke, "RE: [Tsvwg] Updating timestamps (ts_recent) in
              Linux", the IETF TSVWG WG mailing list, August 2003.  URL
              "http://www1.ietf.org/mail-archive/web/tsvwg/current/
              msg04391.html"

   [JBB97]    V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for
              High Performance", (work in progress), Internet-Draft
              <draft-ietf-tcplw-high-performance-00.txt>, February 1997.

   [JBB03]    V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for
              High Performance", (work in progress), Internet-Draft
              <draft-jacobson-tsvwg-1323bis-00.txt>, August 2003.

   [KP87]     P. Karn and C. Partridge, "Estimating Round-Trip Times in
              Reliable Transport Protocols", Proceedings of SIGCOMM'87,
              August 1987.

   [Lud03a]   R. Ludwig, "RE: [Tsvwg] Updating timestamps (ts_recent) in
              Linux", the IETF TSVWG WG mailing list, August 2003.  URL
              "http://www1.ietf.org/mail-archive/web/tsvwg/current/
              msg04389.html"

   [Lud03b]   R. Ludwig, "[Tsvwg] RFC1323.bis [was: Updating timestamps
              (ts_recent)]", the IETF TSVWG WG mailing list, August
              2003.  URL "http://www1.ietf.org/mail-archive/web/tsvwg/
              current/msg04397.html"

   [Mil98]    D. Miller, "possible bug in PAWS", the IETF TCP-IMPL WG
              mailing list, March 1998.  URL "http://tcp-impl.grc.nasa.
              gov/tcp-impl/list/archive/1035.html"

   [MM96]     M. Mathis and J. Mahdavi, "Forward Acknowledgment:


Demizu                   Expires September 2006                [Page 41]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


              Refining TCP Congestion Control", Proceedings of
              SIGCOMM'96, August 1996.

   [MSM99]    M. Mathis, J. Semke, and J. Mahdavi, "The Rate-Halving
              Algorithm for TCP Congestion Control", (work in progress),
              Internet-Draft <draft-mathis-tcp-ratehalving-00.txt>,
              August 1999.

   [PD04]     K. Poon and N. Demizu, "Use of TCP timestamp option to
              defend against blind spoofing attack", (work in progress),
              Internet-Draft <draft-poon-tcp-tstamp-mod-01.txt>, October
              2004.


Author's Address

   Noritoshi Demizu
   National Institute of Information and Communications Technology
   4-2-1 Nukui-Kitamachi, Koganei, Tokyo 184-8795, Japan
   Phone: +81-42-327-7432 (Ex.5813)
   E-mail: demizu@nict.go.jp






























Demizu                   Expires September 2006                [Page 42]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix A: TS2 Reference

   This appendix gives the formal description of TS1 and TS2 by using
   C-like pseudocode as a reference.

   An implementation MAY support TS2.  If an implementation supports
   TS2, it MAY implement one or more of RTTM, PAWS, PASA, DLI and SLID.

A.1 TCP Options

   TS2 uses "the TCP Timestamps option" (see section 3.1), and "the TCP
   Old Timestamps option" (see section 3.2).

   For simplicity, the TCP Timestamps option is called "the TS option".
   The TCP Old Timestamps option with option-length=10 is called "the
   OTS option", and that with option-length=2 is called "the OTS_OK
   option".

A.2 Types

   The following types are used in this appendix: boolean, integer,
   32bit-sequence-number, 32bit-timestamp, and internal-time.

   All arithmetic dealing with the 32bit-sequence-number and
   32bit-timestamp types must be performed modulo 2^32.

   The format of internal-time depends on the implementation:
   for example, it may be an integer with unit = 1 second, or
   an OS-dependent structure.

   The following type conversion function is used:
   time2ts() converts an internal-time value to a 32bit-timestamp value.

A.3 Functions

   The following boolean functions are defined.

   To compare sequence numbers:

      SEQ_GT(a, b)   True if (a >  b) in modulo 2^32.  False, otherwise.
      SEQ_GE(a, b)   True if (a >= b) in modulo 2^32.  False, otherwise.
      SEQ_LT(a, b)   True if (a <  b) in modulo 2^32.  False, otherwise.
      SEQ_LE(a, b)   True if (a <= b) in modulo 2^32.  False, otherwise.

   To compare timestamps:

      TS_GT(a, b)    True if (a >  b) in modulo 2^32.  False, otherwise.
      TS_GE(a, b)    True if (a >= b) in modulo 2^32.  False, otherwise.
      TS_LT(a, b)    True if (a <  b) in modulo 2^32.  False, otherwise.


Demizu                   Expires September 2006                [Page 43]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      TS_LE(a, b)    True if (a <= b) in modulo 2^32.  False, otherwise.

   To compare times:

      TIME_GE(a, b)  True if a is no earlier than b.   False, otherwise.
      TIME_LT(a, b)  True if a is earlier than b.      False, otherwise.

   To check octets:

      IsSACKed(seq)   True if the octet at sequence number "seq" has
                      been ACKed or SACKed.  False, otherwise.
      IsSACKed(range) True if part or all of the octets in "range" have
                      been ACKed or SACKed.  False, otherwise.

      IsInferredLost(seq)   True if the octet at sequence number "seq"
                            has been inferred lost.  False, otherwise.
      IsInferredLost(range) True if part or all of the octets in "range"
                            have been inferred lost.  False, otherwise.

   The following function which returns a timestamp is defined.

   To calculate border timestamp:

      TS2_SLID_BTS(TgtTS, PrbTS) = (k * TgtTS + (1 - k) * PrbTS),
      where k = 1/2.

A.4 Inequalities

   The following inequalities are defined for testing received segments.

      - Inequality (1) --- for data, SYN and FIN segments:

        (RSEG.LEN > 0 &&
         SEQ_LT(Last.Ack.Sent - max(RCV.WND), RSEG.SEQ + RSEG.LEN) &&
         SEQ_LT(RSEG.SEQ, RCV.NXT + RCV.WND))

      - Inequality (2) --- for ACK segments:

        (RSEG.LEN == 0 &&
         SEQ_LE(Last.Ack.Sent - max(RCV.WND), RSEG.SEQ) &&
         SEQ_LT(RSEG.SEQ, RCV.NXT + RCV.WND))

   The following boolean function is defined for evaluating these
   inequalities with respect to a received segment.

      TS_ISLEG()     True if (1) or (2) is satisfied.  False, otherwise.





Demizu                   Expires September 2006                [Page 44]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


A.5 Variables

   If a received segment does not satisfy inequality (1) nor (2),
   variables below MUST NOT be updated.

A.5.1 Variables for Base Mechanism

   The following variables are defined for the base mechanism.
   (See section 4)

      TS.Req (integer)
         - This variable represents a user's request:
             0 if neither TS1 nor TS2 is requested,
             1 if TS1 is requested, or
             2 if TS2 is requested.
         - The initial value is given by the user.

      TS.Mode (integer)
         - This variable represents the result of the mode negotiation:
             negative if negotiation has not been completed,
             0 if both TS1 and TS2 are disabled,
             1 if TS1 is enabled, or
             2 if TS2 is enabled.
         - The initial value is negative.

      TS.Recent (32bit-timestamp)
         - This variable holds the value to be echoed in SSEG.TSecr.
           The initial value is zero.
            - If TS1 is enabled, this variable holds the maximum
              RSEG.TSval value received on segments satisfying
              SEQ_LE(RSEG.SEQ, Last.ACK.sent).
            - If TS2 is enabled, this variable holds the minimum
              RSEG.TSval value on segments satisfying (RSEG.LEN > 0)
              received after a segment has last been sent.
         - It is similar as TS.Recent as defined in [RFC1323].

      TS.RecentIsOld (boolean)
         - This variable is true when TS.Mode == 2 and no segment is
           received after the last segment has been sent.  Therefore,
           this variable indicates whether the value in TS.Recent has
           been echoed to a remote node when TS.Mode == 2.
         - Its value is valid only when TS.Mode is 2.  Its value is
           undefined in other modes.

      Last.Ack.Sent (32bit-sequence-number)
         - This variable holds the last SSEG.ACK value sent.
         - It is the same as Last.Ack.Sent as defined in [RFC1323].

      TS.SndOff (32bit-timestamp)


Demizu                   Expires September 2006                [Page 45]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


         - This variable holds an offset to convert an internal
           timestamp value to an external timestamp value.

      TS.SndAdj (32bit-timestamp)
         - This variable holds the difference between the return values
           of GetIntTS1() and GetIntTS2() when the first SYN was sent.
         - It adjusts TS.SndOff in the TCP three-way handshake phase
           when a TCP connection is established with TS2 by a local
           node.

A.5.2 Variables for PAWS

   The following variables are defined for PAWS.  (See section 6)

      TS.RcvMin (32bit-timestamp)
         - This variable holds the maximum received RSEG.TSval value in
           both the TS option and the OTS option.

      TS.RcvMin_time (internal-time)
         - This variable holds the time when TS.RcvMin was last updated.

A.5.3 Variables for PASA

A.5.3.1 Variables for PASA-DF/TS2

   The following variables are defined for PASA-DF/TS2.
   (See section 7.1)

      TS.SndMin (32bit-timestamp)
         - This variable holds the maximum received RSEG.TSecr value in
           both the TS option and the OTS option.

      TS.SndMax (32bit-timestamp)
         - This variable holds the maximum SSEG.TSval value on sent
           segments satisfying (SSEG.LEN > 0).

      TS.SndMax_time (internal-time)
         - This variable holds the time when TS.SndMax was last updated.
         - This variable is referred to in order to determine whether
           TS.SndOff MUST be tweaked.  If there is another, simpler way
           to determine it, this variable can be omitted.

      TS.PASADF_On (boolean)
         - This variable indicates whether the PASA-DF test can be
           performed.
         - The initial value is true.





Demizu                   Expires September 2006                [Page 46]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


A.5.3.2 Variables for PASA-SR/TS2

   The following variables are defined for PASA-SR/TS2.
   (See section 7.2)

      TS.PASASR_On (boolean)
         - This variable indicates whether PASA-SR/TS2 should be
           performed.
         - The initial value is false.  It is set to true when a SYN
           segment is received against an established TCP connection.

      TS.PASASR_time (internal-time)
         - This variable holds the time when the last SYN segment was
           received.
         - Its value is valid only when TS.PASASR_On is true.

A.5.4 Variables for DLI

A.5.4.1 Variables for DLI-SEG/TS2

   The following variables are defined for DLI-SEG/TS2.  They are
   associated with each data segment.  (See section 8.2.1)

      DS.SndTS (32bit-timestamp) for each data segment
         - This variable holds the SSEG.TSval value on the latest sent
           data segment.

      DS.SndRO (integer) for each data segment
         - This variable counts the number of received segments with the
           TS option satisfying TS_GT(RSEG.TSecr, DS.SndTS), which means
           the number of observed possible reorders.

   The following variables are associated with each data segment.  They
   might have been implemented.

      DS.Start (32bit-sequence-number) for each data segment
         - This variable holds the lowest sequence number of the data
           segment.

      DS.End (32bit-sequence-number) for each data segment
         - This variable holds the highest sequence number in the data
           segment, plus one.

   Note: Since DLI-SEG/TS2 is very powerful, if it is implemented, other
   algorithms need not be implemented.

A.5.4.2 Variables for DLI-SACK/TS2

   The following variables are defined for DLI-SACK/TS2.  They are


Demizu                   Expires September 2006                [Page 47]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   associated with each SACK hole.  (See section 8.2.2)

      SH.SndTS (32bit-timestamp) for each SACK hole
         - This variable holds the SSEG.TSval value on the latest sent
           data segment containing data in this SACK hole.
         - When a SACK hole is allocated, it holds the RSEG.TSecr value.

      SH.SndRO (integer) for each SACK hole
         - This variable counts the number of received segments with the
           TS option satisfying TS_GT(RSEG.TSecr, SH.SndTS), which means
           the number of observed possible reorders.

   The following variables are associated with each SACK hole.  They
   would have been implemented in typical SACK implementations.

      SH.Start (32bit-sequence-number) for each SACK hole
         - This variable holds the lowest sequence number in the SACK
           hole.

      SH.End (32bit-sequence-number) for each SACK hole
         - This variable holds the highest sequence number in the SACK
           hole, plus one.

A.5.4.3 Variables for DLI-UNA/TS2

   The following variables are defined for DLI-UNA/TS2.
   (See section 8.2.3)

      TS.UNA.SndTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the latest sent
           data segment containing data at SND.UNA.
         - Its value is valid only when SEQ_LT(SND.UNA, SND.MAX).

      TS.UNA.SndRO (integer)
         - This variable counts the number of received segments with the
           TS option satisfying TS_GT(RSEG.TSecr, TS.UNA.SndTS), which
           means the number of observed possible reorders.  It is
           cleared when TS.UNA.SndTS is updated.
         - Its value is valid only when SEQ_LT(SND.UNA, SND.MAX).

A.5.4.4 Variables for DLI-NXT/TS2

   The following variables are defined for DLI-NXT/TS2.
   (See section 8.2.4)

      TS.NXT.SndTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the latest sent
           data segment containing data at SND.NXT-1.
         - Its value is valid only when data at SND.NXT-1 have not been


Demizu                   Expires September 2006                [Page 48]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


           acknowledged nor SACKed.

      TS.NXT.SndRO (integer)
         - This variable counts the number of received segments with the
           TS option satisfying TS_GT(RSEG.TSecr, TS.NXT.SndTS), which
           means the number of observed possible reorders.  It is
           cleared when TS.NXT.SndTS is updated.
         - Its value is valid only when data at SND.NXT-1 have not been
           acknowledged nor SACKed.

A.5.4.5 Variables for DLI-MAX/TS2

   The following variables are defined for DLI-MAX/TS2.
   (See section 8.2.5)

      TS.MAX.SndTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the latest sent
           data segment containing data at SND.MAX-1.
         - Its value is valid only when data at SND.MAX-1 have not been
           acknowledged nor SACKed.

      TS.MAX.SndRO (integer)
         - This variable counts the number of received segments with the
           TS option satisfying TS_GT(RSEG.TSecr, TS.MAX.SndTS), which
           means the number of observed possible reorders.  It is
           cleared when TS.MAX.SndTS is updated.
         - Its value is valid only when data at SND.MAX-1 have not been
           acknowledged nor SACKed.

A.5.5 Variables for SLID

   The following variables are defined for SLID.  (See section 9)

A.5.5.1 Variables for SLID-SEG/TS2

   The following variables are defined for SLID-SEG/TS2.  They are
   associated with each data segment.  (See section 9.2.1)

      DS.TgtTS (32bit-timestamp) for each data segment
         - This variable holds the SSEG.TSval value on the original or
           retransmitted decidedly-lost data segments.

      DS.PrbTS (32bit-timestamp) for each data segment
         - This variable holds the SSEG.TSval value on the first data
           segment that is sent since the data segment has been inferred
           lost.





Demizu                   Expires September 2006                [Page 49]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


A.5.5.2 Variables for SLID-SACK/TS2

   The following variables are defined for SLID-SACK/TS2.  They are
   associated with each SACK hole.  (See section 9.2.2)

      SH.TgtTS (32bit-timestamp) for each SACK hole
         - This variable holds the RSEG.TSecr value on the segment that
           created the SACK hole.

      SH.PrbTS (32bit-timestamp) for each SACK hole
         - This variable holds the SSEG.TSval value on the first data
           segment that is sent since the data in the SACK hole has been
           inferred lost.

A.5.5.3 Variables for SLID-UNA/TS2

   The following variables are defined for SLID-UNA/TS2.
   (See section 9.2.3)

      TS.UNA.TgtTS (32bit-timestamp)
         - This variable holds the closest subsequent timestamp value to
           the SSEG.TSval value on original or retransmitted
           decidedly-lost data segments containing data at SND.UNA.

      TS.UNA.PrbTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the first data
           segment that is sent since the data at SND.UNA has been
           inferred lost.

A.5.5.4 Variables for SLID-NXT/TS2

   The following variables are defined for SLID-NXT/TS2.
   (See section 9.2.4)

      TS.NXT.TgtTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the original or
           retransmitted decidedly-lost data segments containing data at
           SND.NXT-1.

      TS.NXT.PrbTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the first data
           segment that is sent since the data at SND.NXT-1 has been
           inferred lost.

A.5.5.5 Variables for SLID-MAX/TS2

   The following variables are defined for SLID-MAX/TS2.
   (See section 9.2.5)



Demizu                   Expires September 2006                [Page 50]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      TS.MAX.TgtTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the original or
           retransmitted decidedly-lost data segments containing data at
           SND.MAX-1.

      TS.MAX.PrbTS (32bit-timestamp)
         - This variable holds the SSEG.TSval value on the first data
           segment that is sent since the data at SND.MAX-1 has been
           inferred lost.

A.6 Current Time

   The following pseudocode functions are defined for getting the
   current time or current timestamp.

      GetTime() --- Get Current Time (internal-time)
         - Get the current time in an internal time format.
         - The time returned by this function MUST NOT be wrapped in the
           lifetime of any TCP connections.

      GetIntTS1() --- Get Current Internal TS for TS1 (32bit-timestamp)
         - Get the current time in a 32-bit unsigned integer in the unit
           of the TS1 timestamp.  Note that the actual SSEG.TSval value
           is calculated as GetIntTS1() + TS.SndOff.
         - The timestamp unit for TS1 is in the range of 1 sec to 1 ms.

      GetIntTS2() --- Get Current Internal TS for TS2 (32bit-timestamp)
         - Get the current time in a 32-bit unsigned integer in the unit
           of the TS2 timestamp.  Note that the actual SSEG.TSval value
           is calculated as GetIntTS2() + TS.SndOff.
         - The timestamp unit for TS2 is fixed at 1 usec.

      GetRTO2() --- Get Current RTO value for TS2 (32bit-timestamp)
         - Get the current RTO value in a 32-bit unsigned integer in the
           unit of the TS2 timestamp so that it can be used in PAWS/TS2
           and PASA-DF/TS2.

A.7 Random Number Generator

   The following pseudocode function is defined for getting random
   numbers.

      RandomNumber(max)
         - It returns a random number between 0 and max.

A.8 Constants

   The following constants are defined.



Demizu                   Expires September 2006                [Page 51]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      TS1_GRANULARITY (32bit-timestamp) for TS1
         - This constant represents the granularity of GetIntTS1() in
           the unit of the TS1 timestamp.

      TS2_GRANULARITY (32bit-timestamp) for TS2
         - This constant represents the granularity of GetIntTS2() in
           the unit of the TS2 timestamp.

      TS1_PAWS_MARGIN (32bit-timestamp) for PAWS/TS1
         - When PAWS/TS1 is enabled, the minimum acceptable RSEG.TSval
           is calculated as (TS.RcvMin - TS1_PAWS_MARGIN).
         - The default value is 1.

      TS1_PAWS_IDLE (internal-time) for PAWS/TS1
         - The value of TS.RcvMin is valid for this amount of time if
           TS1 is enabled.
         - The default value is 24 days.

      TS2_PAWS_IDLE (internal-time) for PAWS/TS2
         - The value of TS.RcvMin is valid for this amount of time if
           TS2 is enabled.
         - The default value is 20 minutes.  (This value should be
           longer than the longest timeout.)

      TS2_PAWS_DEV (32bit-timestamp) for PAWS/TS2
         - This constant represents the acceptable deviation of received
           RSEG.TSval.
         - The default value is 1 minute.

      TS2_PASADF_RNDMAX_REUSE (32bit-timestamp) for PASA-DF/TS2
         - When a TCP control block is reused by a new TCP connection,
           TS.SndOff is increased by a random number in the range from 0
           to this value.
         - The default value is 2^29 - 1 usec (about 9 minutes).

      TS2_PASADF_RNDMAX_IDLE (32bit-timestamp) for PASA-DF/TS2
         - After an idle of TS2_PASADF_MAXADV, TS.SndOff is increased by
           TS2_PASADF_MAXADV plus a random number in the range from 0 to
           this value.
         - The default value is 2^26 - 1 usec (about 67 seconds).

      TS2_PASADF_MAXADV (internal-time) for PASA-DF/TS2
         - This constant represents the maximum increase in SSEG.TSval.
         - The default value is 64 seconds.  (This value SHOULD be
           greater than the maximum RTO value.)

      TS2_PASASR_WIN (integer) for PASA-SR/TS2
         - This constant represents the window size of the ACK segments
           sent in reply to SYN segments.


Demizu                   Expires September 2006                [Page 52]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


         - The default value is the maximum default SMSS value on the
           node.  The value MAY be RMSS on the TCP connection (In this
           case, it is not constant, though).

      TS2_PASASR_TIME (internal-time) for PASA-SR/TS2
         - This constant represents the time in which received RST
           segments should be specially handled.
         - The default value is 10 seconds.

      TS2_DLI_THRESH (integer) for DLI/TS2
         - If the number of observed possible reorders from the point of
           the view of a target octet is greater than or equal to this
           value, the target segment is inferred lost.
         - The default value is 3 (The same value as the so-called
           duplicate acknowledgement threshold specified in [RFC2581]).
         - It might be implemented as an adaptive variable in the
           future.

A.9 Attributes

A.9.1 Attributes of Received Segments

   The following flags represent attributes of the received segment.

      isSYN          True only if it is a SYN segment.
      isSYNACK       True only if it is a SYN+ACK segment.
      isRST          True only if it is a RST segment.
      isFirstSYN     True only if it is the first SYN segment.
      isFirstSYNACK  True only if it is the first SYN+ACK segment.
      withTS         True only if it carries the TS option.
      withOTS        True only if it carries the OTS option.
      withOTS_OK     True only if it carries the OTS_OK option.

A.9.2 Attributes of TCP Connection

   The following variable indicates an attribute of TCP connection.

      TCP.State      The current TCP State (See section 3.2 of [RFC793])

A.10 Procedures

A.10.1 Initialization

   When a TCP Control Block is created or reused, the procedure below is
   followed to initialize the variables.

        /* Step 1: Base(TS1&TS2) */
        TS.Req = 0, 1 or 2;     /* Requested by user */
        TS.Mode = -1;           /* Not negotiated yet */


Demizu                   Expires September 2006                [Page 53]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        TS.SndAdj = 0;
        if (<TCP Control Block is reused>) {
                if (TS.Req == 2) {
                        /*
                         * To avoid reusing the same range of SSEG.TSval
                         */
                        TS.SndOff += (GetIntTS2() - GetIntTS1());
                } else {
                        /* No need to change TS.SndOff */
                }
        } else {
                TS.SndOff = 0;
        }

        if (TS.Req == 2) {
                /* Step 2-1: PASA-DF/TS2 */
                TS.PASADF_On = true;
                if (<TCP Control Block is reused>) {
                        TS.SndOff +=
                                RandomNumber(TS2_PASADF_RNDMAX_REUSE);
                } else {
                        /* Randomize the initial timestamp */
                        TS.SndOff = RandomNumber(2^32);
                }
                /* Step 2-2: PASA-SR/TS2 */
                TS.PASASR_On = false;
        }

   When a SACK hole is allocated, the procedure below is followed to
   initialize the variables.

        /* Step 1: DLI-SACK/TS2 */
        if (TS.Mode == 2) {
                SH.SndTS = RSEG.TSecr;
                SH.SndRO = 0;
        }

        /* Step 2: SLID-SACK/TS2 */
        if (TS.Mode == 2) {
                SH.TgtTS = RSEG.TSecr;
                SH.PrbTS = RSEG.TSecr;
        }

A.10.2 Input Processing

A.10.2.1 Input Processing in LISTEN and SYN-SENT States

   When a SYN segment is received in the LISTEN state, or a SYN or
   SYN+ACK segment is received in the SYN-SENT state, the procedure


Demizu                   Expires September 2006                [Page 54]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   below is followed.

        /* Step 1: Base(TS1&TS2) */
        TS.Mode = (!withTS ? 0 : !withOTS_OK ? 1 : 2);
        if (TS.Mode > TS.Req) {
                TS.Mode = TS.Req;
        }
        if (TS.Mode > 0) {
                TS.Recent = RSEG.TSval;
                if (TS.Mode == 2) {
                        TS.RecentIsOld = false;
                        if (TCP.State == "SYN-SENT") {
                                TS.SndOff += TS.SndAdj;
                        }
                }
        }

        /* Step 2: RTTM(TS1&TS2) */
        if (TCP.State == "SYN-SENT" && TS.Mode > 0) {
                if (TS.Mode == 1 && withTS && RSEG.TSecr != 0) {
                        Measured_RTT = ((GetIntTS1() + TS.SndOff)
                                        - RSEG.TSecr + TS1_GRANULARITY);
                } else if (TS.Mode == 2 && withTS) {
                        Measured_RTT = ((GetIntTS2() + TS.SndOff)
                                        - RSEG.TSecr + TS2_GRANULARITY);
                }
        }

        /* Step 3: PAWS(TS1&TS2) */
        if (TS.Mode > 0) {
                TS.RcvMin = RSEG.TSval;
                TS.RcvMin_time = GetTime();
        }

A.10.2.2 Input Processing in Other States

   When a segment is received in the SYN-RECEIVED state, the ESTABLISHED
   state, or a later state, the procedure below is followed.  Note that
   TS.Mode is determined when the first SYN or SYN+ACK segment is
   received, as described in appendix A.10.2.1.

        /*
         * Step 1: Check received segment.
         */
        if (TS.Mode == 1) {
                /* Step 1-1-1: PAWS/TS1 */
                idle_time = GetTime() - TS.RcvMin_time;
                if (!isSYN && && !isRST &&
                    TIME_LT(idle_time, TS1_PAWS_IDLE) &&


Demizu                   Expires September 2006                [Page 55]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                    TS_LT(RSEG.TSval, TS.RcvMin - TS1_PAWS_MARGIN)) {
                        /* This segment MUST be dropped. */
                        /* An ACK SHOULD be sent in reply. */
                }
        } else if (TS.Mode == 2) {
                /* Step 1-2-1: Base(TS2) */
                if ((!isSYN && !isRST && (!withTS && !withOTS)) ||
                    (withTS && withOTS)) {
                        /* This segment MUST be dropped. */
                        /* An ACK SHOULD be sent in reply. */
                }
                /* Step 1-2-2: PAWS/TS2 */
                idle_time = GetTime() - TS.RcvMin_time;
                if (!isSYN && (!isRST || withOTS) &&
                    TIME_LT(idle_time, TS2_PAWS_IDLE)) {
                        if (TS_LT(RSEG.TSval, TS.RcvMin - GetRTO2())) {
                                /* This segment MUST be dropped. */
                                /* An ACK SHOULD be sent in reply. */
                        }
                        mean_ts = (TS.RcvMin + time2ts(idle_time));
                        if (TS_GT(RSEG.TSval, mean_ts + TS2_PAWS_DEV)) {
                                /*
                                 * This segment MAY be dropped.
                                 * If PASA-DF/TS2 is enabled,
                                 * it SHOULD be dropped.
                                 * If it is dropped,
                                 * an ACK SHOULD be sent in reply.
                                 */
                        }
                }
                /* Step 1-2-3-1: PASA-DF/TS2 */
                if (!isSYN &&  (!isRST || withOTS) && TS.PASADF_On) {
                        if (TS_LT(RSEG.TSecr, TS.SndMin - GetRTO2()) ||
                            TS_GT(RSEG.TSecr, TS.SndMax)) {
                                /* This segment MUST be dropped. */
                                /* An ACK SHOULD be sent in reply. */
                        }
                }
                /* Step 1-2-3-2: PASA-SR/TS2 */
                if (isSYN) {
                        TS.PASASR_On = true;
                        TS.PASASR_time = GetTime();
                        /*
                         * This segment MUST be dropped.
                         * An ACK with win=TS2_PASASR_WIN
                         * without TS nor OTS SHOULD be sent in reply.
                         */
                }
                if (isRST && !withOTS && TS.PASASR_On) {


Demizu                   Expires September 2006                [Page 56]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                        if (TIME_GE(GetTime() - TS.PASASR_time,
                                    TS2_PASASR_TIME)) {
                                TS.PASASR_On = false;
                        }
                        if (TS.PASASR_On &&
                            !(SEQ_LE(RCV.NXT, RSEG.SEQ) &&
                              SEQ_LT(RSEG.SEQ,
                                     RCV.NXT + TS2_PASASR_WIN))) {
                                /* This segment MUST be dropped. */
                                /* ACK MUST NOT be sent. */
                        }
                }
        }

        /*
         * Step 2: Check acceptability by [RFC793].
         */

        /*
         * Step 3: Process received segment.
         *
         * Note: (RSEG.LEN > 0 && TS_ISLEG()) is equal to inequality (1)
         */
        if (TS.Mode == 1 && TS_ISLEG()) {
                /* Step 3-1-1: Base(TS1) */
                if (RSEG.LEN > 0 && SEQ_LE(RSEG.SEQ, Last.ACK.sent) &&
                    TS_LT(TS.Recent, RSEG.TSval)) {
                        TS.Recent = RSEG.TSval;
                }
                /* Step 3-1-2: RTTM/TS1 */
                if (withTS && RSEG.TSecr != 0) {
                        Measured_RTT = ((GetIntTS1() + TS.SndOff)
                                        - RSEG.TSecr + TS1_GRANULARITY);
                }
                /* Step 3-1-3: PAWS/TS1 */
                if (TIME_GE(GetTime() - TS.RcvMin_time, TS1_PAWS_IDLE)||
                    TS_LT(TS.RcvMin, RSEG.TSval)) {
                        TS.RcvMin = RSEG.TSval;
                        TS.RcvMin_time = GetTime();
                }
        } else if (TS.Mode == 2 && TS_ISLEG()) {
                /* Step 3-2-1: Base(TS2) */
                if (RSEG.LEN > 0 &&
                    (TS.RecentIsOld || TS_GT(TS.Recent, RSEG.TSval))) {
                        TS.Recent = RSEG.TSval;
                        TS.RecentIsOld = false;
                }
                /* Step 3-2-2: RTTM/TS2 */
                if (withTS) {


Demizu                   Expires September 2006                [Page 57]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                        Measured_RTT = ((GetIntTS2() + TS.SndOff)
                                        - RSEG.TSecr + TS2_GRANULARITY);
                }
                /* Step 3-2-3: PAWS/TS2 */
                if (TIME_GE(GetTime() - TS.RcvMin_time, TS2_PAWS_IDLE)||
                    TS_LT(TS.RcvMin, RSEG.TSval)) {
                        TS.RcvMin = RSEG.TSval;
                        TS.RcvMin_time = GetTime();
                }
                /* Step 3-2-4-1: PASA-DF/TS2 */
                if (TS.PASADF_On) {
                        if (TS_LT(TS.SndMin, RSEG.TSecr)) {
                                TS.SndMin = RSEG.TSecr;
                        }
                } else {
                        if (TS_GE(TS.SndMax, RSEG.TSecr)) {
                                /* Restart the PASA-DF test. */
                                TS.SndMin = RSEG.TSecr;
                                TS.PASADF_On = true;
                        }
                }
                /* Step 3-2-4-2: PASA-SR/TS2 */
                if (TS.PASASR_On && (!isRST || withOTS)) {
                        TS.PASASR_On = false;
                }
                /* Step 3-2-5-1: DLI-SEG/TS2 */
                foreach data segment {
                        if (withTS && TS_GT(RSEG.TSecr, DS.SndTS) &&
                            ++DS.SndRO >= TS2_DLI_THRESH) {
                                /*
                                 * This data segment is inferred lost.
                                 */
                        }
                }
                /* Step 3-2-5-2: DLI-SACK/TS2 */
                foreach SACK hole {
                        if (withTS && TS_GT(RSEG.TSecr, SH.SndTS) &&
                            ++SH.SndRO >= TS2_DLI_THRESH) {
                                /*
                                 * Whole data in this SACK hole
                                 * is inferred lost.
                                 */
                        }
                        /* To help DLI-UNA/TS2 */
                        if (SEQ_LE(SH.Start, new SND.UNA) &&
                            SEQ_LT(new SND.UNA, SH.End)) {
                                /* New SND.UNA is in this SACK hole. */
                                TS.UNA.SndTS = SH.SndTS;
                                TS.UNA.SndRO = SH.SndRO;


Demizu                   Expires September 2006                [Page 58]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                        }
                }
                /* Step 3-2-5-3: DLI-UNA/TS2 */
                if (SEQ_LT(old SND.UNA, SND.MAX)) {
                        if (SEQ_LE(RSEG.ACK, old SND.UNA)) {
                                if (withTS &&
                                    TS_GT(RSEG.TSecr, TS.UNA.SndTS) &&
                                    ++TS.UNA.SndRO >= TS2_DLI_THRESH) {
                                        /*
                                         * Data at SND.UNA
                                         * is inferred lost.
                                         */
                                }
                        } else {
                                TS.UNA.SndTS = GetIntTS2() + TS.SndOff;
                                TS.UNA.SndRO = 0;
                        }
                }
                /* Step 3-2-5-4: DLI-NXT/TS2 */
                if (!IsSACKed(SND.NXT-1)) {
                        if (withTS && TS_GT(RSEG.TSecr, TS.NXT.SndTS) &&
                            ++TS.NXT.SndRO >= TS2_DLI_THRESH) {
                                if (SEQ_LT(new SND.FACK, SND.NXT)) {
                                        /*
                                         * Data between SND.FACK and
                                         * SND.NXT is inferred lost.
                                         */
                                } else {
                                        /*
                                         * Data at SND.NXT-1
                                         * is inferred lost.
                                         */
                                }
                        }
                }
                /* Step 3-2-5-5: DLI-MAX/TS2 */
                if (!IsSACKed(SND.MAX-1)) {
                        if (withTS && TS_GT(RSEG.TSecr, TS.MAX.SndTS) &&
                            ++TS.MAX.SndRO >= TS2_DLI_THRESH) {
                                if (SEQ_LT(SND.NXT, new SND.FACK)) {
                                        /*
                                         * Data between SND.FACK and
                                         * SND.MAX is inferred lost.
                                         */
                                } else {
                                        /*
                                         * Data at SND.MAX-1
                                         * is inferred lost.
                                         */


Demizu                   Expires September 2006                [Page 59]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                                }
                        }
                }
                /* Step 3-2-6-1: SLID-SEG/TS2 */
                foreach data segment {
                        if (TS_LT(DS.TgtTS, DS.PrbTS)) {
                                b_ts = TS2_SLID_BTS(DS.TgtTS, DS.PrbTS);
                                if (TS_GT(RSEG.TSecr, b_ts)) {
                                        /*
                                         * The loss inference is
                                         * GENUINE.
                                         */
                                        DS.TgtTS = DS.PrbTS;
                                } else if (IsSACKed(DS)) {
                                        /*
                                         * The loss inference is
                                         * SPURIOUS.  Execute a response
                                         * algorithm if necessary.
                                         */
                                        DS.TgtTS = DS.PrbTS;
                                }
                        }
                }
                /* Step 3-2-6-2: SLID-SACK/TS2 */
                foreach SACK hole {
                        if (TS_LT(SH.TgtTS, SH.PrbTS)) {
                                b_ts = TS2_SLID_BTS(SH.TgtTS, SH.PrbTS);
                                if (TS_GT(RSEG.TSecr, b_ts)) {
                                        /*
                                         * The loss inference is
                                         * GENUINE.
                                         */
                                        SH.TgtTS = SH.PrbTS;
                                } else if (IsSACKed(SH)) {
                                        /*
                                         * The loss inference is
                                         * SPURIOUS.  Execute a response
                                         * algorithm if necessary.
                                         */
                                        SH.TgtTS = SH.PrbTS;
                                }
                        }
                        /* To help SLID-UNA/TS2 */
                        if (SEQ_LE(SH.Start, new SND.UNA) &&
                            SEQ_LT(new SND.UNA, SH.End)) {
                                /* New SND.UNA is in this SACK hole. */
                                TS.UNA.TgtTS = SH.TgtTS;
                                TS.UNA.PrbTS = SH.PrbTS;
                        }


Demizu                   Expires September 2006                [Page 60]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                }
                /* Step 3-2-6-3: SLID-UNA/TS2 */
                if (TS_LT(TS.UNA.TgtTS, TS.UNA.PrbTS)) {
                        b_ts = TS2_SLID_BTS(TS.UNA.TgtTS, TS.UNA.PrbTS);
                        if (TS_GT(RSEG.TSecr, b_ts)) {
                                /*
                                 * The loss inference is GENUINE.
                                 */
                                TS.UNA.TgtTS = TS.UNA.PrbTS;
                        } else if (SEQ_LT(old SND.UNA, new SND.UNA)) {
                                /*
                                 * The loss inference is SPURIOUS.
                                 * Execute a response algorithm
                                 * if necessary.
                                 */
                                if (TS_GT(RSEG.TSecr, TS.UNA.PrbTS)) {
                                        TS.UNA.PrbTS = RSEG.TSecr;
                                }
                                TS.UNA.TgtTS = TS.UNA.PrbTS;
                        }
                } else {
                        if (SEQ_LT(old SND.UNA, new SND.UNA)) {
                                if (TS_GT(RSEG.TSecr, TS.UNA.PrbTS)) {
                                        TS.UNA.PrbTS = RSEG.TSecr;
                                }
                                TS.UNA.TgtTS = TS.UNA.PrbTS;
                        }
                }
                /* Step 3-2-6-4: SLID-NXT/TS2 */
                if (TS_LT(TS.NXT.TgtTS, TS.NXT.PrbTS)) {
                        b_ts = TS2_SLID_BTS(TS.NXT.TgtTS, TS.NXT.PrbTS);
                        if (TS_GT(RSEG.TSecr, b_ts)) {
                                /*
                                 * The loss inference is GENUINE.
                                 */
                                TS.NXT.TgtTS = TS.NXT.PrbTS;
                        } else if (IsSACKed(SND.NXT-1)) {
                                /*
                                 * The loss inference is SPURIOUS.
                                 * Execute a response algorithm
                                 * if necessary.
                                 */
                                TS.NXT.TgtTS = TS.NXT.PrbTS;
                        }
                }
                /* Step 3-2-6-5: SLID-MAX/TS2 */
                if (TS_LT(TS.MAX.TgtTS, TS.MAX.PrbTS)) {
                        b_ts = TS2_SLID_BTS(TS.MAX.TgtTS, TS.MAX.PrbTS);
                        if (TS_GT(RSEG.TSecr, b_ts)) {


Demizu                   Expires September 2006                [Page 61]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                                /*
                                 * The loss inference is GENUINE.
                                 */
                                TS.MAX.TgtTS = TS.MAX.PrbTS;
                        } else if (IsSACKed(SND.MAX-1)) {
                                /*
                                 * The loss inference is SPURIOUS.
                                 * Execute a response algorithm
                                 * if necessary.
                                 */
                                TS.MAX.TgtTS = TS.MAX.PrbTS;
                        }
                }
        }

A.10.3 Output Processing

   When a RST segment is sent in reply to a received segment because of
   [RFC793], the following processing is utilized.

      - If the received segment carries the TS option or the OTS option,
        the RST segment MUST carry the OTS option with
        <SSEG.TSval=RSEG.TSecr><SSEG.TSecr=RSEG.TSval>.  Otherwise, the
        RST segment SHOULD NOT carry the TS option nor the OTS option.

   When an ACK segment is sent in reply to a received SYN or SYN+ACK
   segment because of [RFC793], the following procedure is performed:

        /* Step 1: PASA-SR/TS2 */
        if (TS.Mode == 2) {
                TS.PASASR_On = true;
                TS.PASASR_time = GetTime();
                /*
                 * This segment MUST be dropped.
                 * An ACK with win=TS2_PASASR_WIN
                 * without TS nor OTS SHOULD be sent in reply.
                 */
        }

   In other cases, when a segment is sent, the procedure below is
   followed:

        /* Step 1: PASA-DF/TS2 */
        /* To avoid advancing SSEG.TSval too much after an idle. */
        if (TS.Mode == 2) {
                over_time = ((GetTime() - TS.SndMax_time)
                             - TS2_PASADF_MAXADV);
                if (over_time > 0) {
                        TS.SndOff -= time2ts(over_time);


Demizu                   Expires September 2006                [Page 62]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                        TS.SndOff +=
                                RandomNumber(TS2_PASADF_RNDMAX_IDLE);
                        TS.SndMax_time = GetTime();
                }
        }

        /* Step 2: Base(TS1&TS2) */
        if (isSYN ? (TS.Req > 0) : (TS.Mode > 0)) {
                if (isSYN && TS.Req == 2) {
                        /* Put the OTS_OK option. */
                }
                if (TS.Mode == 2) {
                        if (isRST || TS.RecentIsOld) {
                                SSEG.TSkind = OTS;
                        } else {
                                SSEG.TSkind = TS;
                                TS.RecentIsOld = true;
                        }
                        SSEG.TSval = GetIntTS2() + TS.SndOff;
                        SSEG.TSecr = TS.Recent;
                } else {
                        SSEG.TSkind = TS;
                        SSEG.TSval = GetIntTS1() + TS.SndOff;
                        SSEG.TSecr = TS.Recent;
                }
        }
        if (isFirstSYN && TS.Req == 2 && TS.Mode < 0) {
                TS.SndAdj = GetIntTS1() - GetIntTS2();
        }
        LAST.Ack.Sent = SSEG.ACK;

        /* Step 3: PASA-DF/TS2 */
        if ((isSYN ? (TS.Req == 2) : (TS.Mode == 2)) && SSEG.LEN > 0) {
                TS.SndMax = SSEG.TSval;
                TS.SndMax_time = GetTime();
                if (TS_GT(TS.SndMin, TS.SndMax)) {
                        /* Stop the PASA-DF test */
                        TS.PASADF_On = false;
                }
        }
        if ((isFirstSYN || isFirstSYNACK) && TS.Req == 2) {
                TS.SndMin = SSEG.TSval;
        }

        /* Step 4-1: DLI-SEG/TS2 */
        if (TS.Mode == 2 && SSEG.LEN > 0) {
                /* A data segment is sent. */
                DS.SndTS = SSEG.TSval;
                DS.SndRO = 0;


Demizu                   Expires September 2006                [Page 63]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        }

        /* Step 4-2: DLI-SACK/TS2 */
        if (TS.Mode == 2 && SSEG.LEN > 0) {
                /* Data in a SACK hole is retransmitted. */
                if (SEQ_LT(SSEG.SEQ, SH.End) &&
                    SEQ_LT(SH.Start, SSEG.SEQ + SSEG.LEN)) {
                        SH.SndTS = SSEG.TSval;
                        SH.SndRO = 0;
                }
        }

        /* Step 4-3: DLI-UNA/TS2 */
        if ((isSYN ? (TS.Req == 2) : (TS.Mode == 2)) && SSEG.LEN > 0) {
                /* Data at SND.UNA is sent. */
                if (SEQ_LE(SSEG.SEQ, SND.UNA) &&
                    SEQ_LT(SND.UNA, SSEG.SEQ + SSEG.LEN)) {
                        TS.UNA.SndTS = SSEG.TSval;
                        TS.UNA.SndRO = 0;
                }
        }

        /* Step 4-4: DLI-NXT/TS2 */
        if ((isSYN ? (TS.Req == 2) : (TS.Mode == 2)) && SSEG.LEN > 0) {
                /* Data at SND.NXT or between SND.FACK and SND.NXT */
                if ((SSEQ.SEQ == old SND.NXT && SND.WND > 0) ||
                    (SEQ_LT(SND.FACK, old SND.NXT) &&
                     SEQ_LT(SSEG.SEQ, old SND.NXT) &&
                     SEQ_LT(SND.FACK, SSEG.SEQ + SSEG.LEN))) {
                        TS.NXT.SndTS = SSEG.TSval;
                        TS.NXT.SndRO = 0;
                }
        }

        /* Step 4-5: DLI-MAX/TS2 */
        if ((isSYN ? (TS.Req == 2) : (TS.Mode == 2)) && SSEG.LEN > 0) {
                /* Data at SND.MAX or between SND.FACK and SND.MAX */
                if ((SSEQ.SEQ == old SND.MAX && SND.WND > 0) ||
                    (SEQ_LT(old SND.NXT, SND.FACK) &&
                     SEQ_LT(SSEG.SEQ, old SND.MAX) &&
                     SEQ_LT(SND.FACK, SSEG.SEQ + SSEG.LEN))) {
                        TS.MAX.SndTS = SSEG.TSval;
                        TS.MAX.SndRO = 0;
                }
        }

        /* Step 5-1: SLID-SEG/TS2 */
        if (TS.Mode == 2) {
                foreach data segment {


Demizu                   Expires September 2006                [Page 64]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                        if (IsInferredLost(DS)) {
                                if (TS_GT(DS.TgtTS, SSEG.TSval)) {
                                        /*
                                         * The loss inference is assumed
                                         * SPURIOUS.  Execute a response
                                         * algorithm if necessary.
                                         */
                                } else {
                                        /* An probe segment is sent. */
                                        DS.PrbTS = SSEG.TSval;
                                }
                        } else {
                                /* Assert(DS.TgtTS == DS.PrbTS) */
                                if (SEQ_LT(SSEG.SEQ, DS.End) &&
                                           SEQ_LT(DS.Start,
                                                  SSEG.SEQ + SSEG.LEN)){
                                        /*
                                         * The sent segment is original
                                         * or decidedly lost.
                                         */
                                        DS.TgtTS = SSEG.TSval;
                                        DS.PrbTS = SSEG.TSval;
                                }
                        }
                }
        }

        /* Step 5-2: SLID-SACK/TS2 */
        if (TS.Mode == 2) {
                foreach SACK hole {
                        if (IsInferredLost(SH)) {
                                if (TS_GT(SH.TgtTS, SSEG.TSval)) {
                                        /*
                                         * The loss inference is assumed
                                         * SPURIOUS.  Execute a response
                                         * algorithm if necessary.
                                         */
                                } else {
                                        /* An probe segment is sent. */
                                        SH.PrbTS = SSEG.TSval;
                                }
                        } else {
                                /* Assert(SH.TgtTS == SH.PrbTS) */
                                if (SEQ_LT(SSEG.SEQ, SH.End) &&
                                           SEQ_LT(SH.Start,
                                                  SSEG.SEQ + SSEG.LEN)){
                                        /*
                                         * The sent segment is original
                                         * or decidedly lost.


Demizu                   Expires September 2006                [Page 65]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                                         */
                                        SH.TgtTS = SSEG.TSval;
                                        SH.PrbTS = SSEG.TSval;
                                }
                        }
                }
        }

        /* Step 5-3: SLID-UNA/TS2 */
        if ((isFirstSYN || isFirstSYNACK) && TS.Req == 2) {
                TS.UNA.TgtTS = SSEG.TSval;
                TS.UNA.PrbTS = SSEG.TSval;
        }
        if (TS.Mode == 2) {
                if (IsInferredLost(SND.UNA)) {
                        if (TS_GT(TS.UNA.TgtTS, SSEG.TSval)) {
                                /*
                                 * The loss inference is assumed
                                 * SPURIOUS.  Execute a response
                                 * algorithm if necessary.
                                 */
                        } else {
                                /* An probe segment is sent. */
                                TS.UNA.PrbTS = SSEG.TSval;
                        }
                } else {
                        /* Assert(TS.UNA.TgtTS == TS.UNA.PrbTS) */
                        if (SEQ_LE(SSEG.SEQ, SND.UNA) &&
                                   SEQ_LT(SND.UNA,
                                          SSEG.SEQ + SSEG.LEN)) {
                                /*
                                 * The sent segment is original
                                 * or decidedly lost.
                                 */
                                TS.UNA.TgtTS = SSEG.TSval;
                                TS.UNA.PrbTS = SSEG.TSval;
                        }
                }
        }

        /* Step 5-4: SLID-NXT/TS2 */
        if ((isFirstSYN || isFirstSYNACK) && TS.Req == 2) {
                /* Initialize TS.NXT.TgtTS and TS.NXT.PrbTS */
                TS.NXT.TgtTS = SSEG.TSval;
                TS.NXT.PrbTS = SSEG.TSval;
        }
        if (TS.Mode == 2) {
                if (IsInferredLost(new SND.NXT-1)) {
                        if (TS_GT(TS.NXT.TgtTS, SSEG.TSval)) {


Demizu                   Expires September 2006                [Page 66]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                                /*
                                 * The loss inference is assumed
                                 * SPURIOUS.  Execute a response
                                 * algorithm if necessary.
                                 */
                        } else {
                                /* An probe segment is sent. */
                                TS.NXT.PrbTS = SSEG.TSval;
                        }
                } else {
                        /* Assert(TS.NXT.TgtTS == TS.NXT.PrbTS) */
                        if (SEQ_LE(SEG.SEQ, new SND.NXT-1) &&
                                   SEQ_LT(new SND.NXT-1,
                                          SSEQ.SEQ + SSEG.LEN)) {
                                /*
                                 * The sent segment is original
                                 * or decidedly lost.
                                 */
                                TS.NXT.TgtTS = SSEG.TSval;
                                TS.NXT.PrbTS = SSEG.TSval;
                        }
                }
        }

        /* Step 5-5: SLID-MAX/TS2 */
        if ((isFirstSYN || isFirstSYNACK) && TS.Req == 2) {
                /* Initialize TS.MAX.TgtTS and TS.MAX.PrbTS */
                TS.MAX.TgtTS = SSEG.TSval;
                TS.MAX.PrbTS = SSEG.TSval;
        }
        if (TS.Mode == 2) {
                if (IsInferredLost(new SND.MAX-1)) {
                        if (TS_GT(TS.MAX.TgtTS, SSEG.TSval)) {
                                /*
                                 * The loss inference is assumed
                                 * SPURIOUS.  Execute a response
                                 * algorithm if necessary.
                                 */
                        } else {
                                /* An probe segment is sent. */
                                TS.MAX.PrbTS = SSEG.TSval;
                        }
                } else {
                        /* Assert(TS.MAX.TgtTS == TS.MAX.PrbTS) */
                        if (SEQ_LE(SEG.SEQ, new SND.MAX-1) &&
                                   SEQ_LT(new SND.MAX-1,
                                          SSEQ.SEQ + SSEG.LEN)) {
                                /*
                                 * The sent segment is original


Demizu                   Expires September 2006                [Page 67]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


                                 * or decidedly lost.
                                 */
                                TS.MAX.TgtTS = SSEG.TSval;
                                TS.MAX.PrbTS = SSEG.TSval;
                        }
                }
        }

A.11 Layouts of TCP Options

   When both the OTS_OK option and the TS option are sent on SYN or
   SYN+ACK segments, the following format is recommended.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Kind = <TBD>  |  Length = 2   |   Kind = 8    |  Length = 10  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSval (TS Value)                         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSecr (TS Echo Reply)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure A-1: The OTS_OK option and the TS option

   When either the TS option or the OTS option is sent, the following
   format is recommended.  Two NOPs may be replaced with another
   2-octet option.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      NOP      |      NOP      |   Kind = 8    |  Length = 10  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSval (TS Value)                         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                      TSecr (TS Echo Reply)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure A-2: The TS option and NOPs











Demizu                   Expires September 2006                [Page 68]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix B: Granularity of Timestamps

   The granularity of a timestamp is a different concept from the unit
   of a timestamp.  Timestamps are internally generated from an internal
   tick count or a real time clock.  The unit of a timestamp means the
   time length of 1 in the timestamp, while the granularity of a
   timestamp means the interval time for updating a timestamp source so
   that the resulting timestamp is changed.

      Note: In many cases, the granularity of a timestamp would not be
      shorter than the unit of the timestamp.  With TS1, since the
      timestamp unit can be chosen between 1 second and 1 ms, it would
      be simplest to make it the same as the granularity of a timestamp.
      In contrast, with TS2, since the timestamp unit is fixed at 1
      usec, the granularity will be much longer than the unit in most
      cases.

   If the granularity of timestamps is coarser than the mean time
   between each data transmission, multiple data segments may carry the
   same SSEG.TSval value, and DLI/TS2 and SLID/TS2 would be less
   effective.  If the granularity of timestamp is not fine enough, the
   following idea might improve it.

   The first idea uses segment counter.  It is associated with each TCP
   connection.  It is increased by one when a segment is sent.  Its
   maximum value is TS2_GRANULARITY - 1 to keep timestamps monotonically
   nondecreasing.  It is cleared when timestamp source is changed, or
   TS.SndOff is changed.  Whenever an internal timestamp is generated,
   it is added to the timestamp.

   The second idea uses a CPU's embedded cycle counter.  Each time when
   an internal tick count is increased, or a real time clock is read,
   the value of CPU's embedded cycle counter is also recorded.  Then,
   when an internal timestamp is generated, add the difference between
   the recorded cycle counter value and the current cycle counter value
   to the generated timestamp.

   In any case, since the timestamp unit for TS2 is fine (i.e., 1 usec)
   compared to today's possible RTTs, some lower bits of internal
   timestamps might be usable as nonce to obfuscate timestamps.











Demizu                   Expires September 2006                [Page 69]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix C: Loss Inference With SACK and DLI/TS2

   This appendix discusses a loss inference procedure making use of SACK
   [RFC2018][RFC3517] and DLI/TS2.

   It does not discuss which data should be transmitted when more than
   one data segment can be sent or retransmitted.  This topic is outside
   the scope of this appendix.

C.1 Highest Sequence Number of Retransmitted Data

   This appendix uses one variable: SND.RTX (32bit-sequence-number).
   RTX stands for retransmission.  It holds the maximum sequence number
   of acknowledged or retransmitted data, plus one octet.

   The initial value of SND.RTX is SND.UNA.  When an acceptable segment
   is received, if SND.UNA is advanced and (SND.RTX < new SND.UNA) is
   true, then the new SND.UNA is copied to SND.RTX.  When a segment is
   sent, if (SSEG.SEQ < SND.MAX && SND.RTX < SSEG.SEQ + SSEG.LEN) is
   satisfied, then SSEG.SEQ + SSEG.LEN is copied to SND.RTX.  Note that
   SND.RTX is not rewound upon a retransmission timeout.

   As a result, all retransmitted but unacknowledged data satisfies
   (SND.UNA <= data < SND.RTX), while all transmitted but unacknowledged
   data satisfies (SND.UNA <= data < SND.MAX).  By the definition given
   in the terminology section, at least one octet in an "original data
   segment" satisfies (SND.RTX <= octet < SND.MAX), and all octets in a
   "retransmitted data segment" satisfy (SND.UNA <= octet < SND.RTX).

C.2 Loss Inference

   This subsection describes how losses are inferred with or without
   SACK and DLI/TS2.

   When DLI/TS2 is enabled, it infers losses of both original and
   retransmitted data segments in the range from SND.UNA to SND.MAX.
   Since DLI/TS2 counts the number of received segments with the TS
   option for inferring losses, it is not very robust against losses of
   segments sent by a remote node.

   When SACK is enabled, IsLost(), as specified in [RFC3517], infers
   losses of original data segments.  In other words, it can infer
   losses of data satisfying (SND.RTX <= data < SND.MAX), but it cannot
   infer losses of data satisfying (SND.UNA <= data < SND.RTX).
   Nevertheless, it is more robust against losses of ACK segments than
   DLI/TS2, because multiple SACK blocks can be sent on each segment.
   Therefore, in spite of its limitations, IsLost() is helpful even when
   DLI/TS2 is enabled.



Demizu                   Expires September 2006                [Page 70]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   When DLI/TS2 is disabled, if SACK is disabled or (SND.UNA < SND.RTX)
   is true, duplicate ACK segments need to be counted to trigger a Fast
   Retransmit (i.e., to infer the loss of data at SND.UNA), as follows:

      - After a retransmission timeout, duplicate ACK segments SHOULD
        NOT be counted until the retransmitted data is acknowledged.

        The purpose is to avoid counting duplicate ACK segments sent in
        reply to data segments that were sent before the timeout.  Such
        duplicate ACK segments are often observed when a retransmission
        timeout is triggered because of the loss of the data segment
        sent by a Fast Retransmit.

      - Duplicate ACK segments for data below SND.UNA SHOULD NOT be
        counted.  That is, if SACK is enabled, ACK segments with D-SACK
        [RFC2883] below RSEG.ACK and ACK segments without SACK blocks
        SHOULD NOT be counted.

      - Duplicate ACK segments for data above SND.UNA SHOULD be counted.
        That is, if SACK is enabled, ACK segments with D-SACK above
        RSEG.ACK and ACK segments with SACK blocks but without D-SACK
        SHOULD be counted.  If TS2 is enabled, however, segments without
        the TS option SHOULD NOT be counted for accuracy.

      - When SND.UNA is advanced in the loss recovery phase, regardless
        of the number of received duplicate ACK segments, data starting
        at the new SND.UNA SHOULD be inferred lost as with NewReno
        [RFC3782].  This is especially helpful when SACK is enabled and
        (new SND.UNA < SND.RTX) is true.

C.3 SACK Scoreboard

   A data sender SHOULD maintain a SACK scoreboard carefully so that it
   can effectively recover losses and transmit new data.

   According to section 5.1 of [RFC2018], "When a retransmit timeout
   occurs the data sender MUST ignore prior SACK information in
   determining which data to retransmit".  When TS2 is enabled, however,
   this appendix recommends that the SACK scoreboard not be discarded
   upon a retransmission timeout.  Instead, it recommends that existing
   SACK blocks in the SACK scoreboard be updated by newly received SACK
   blocks if there are conflicts, as follows.

      - One variable is associated with each SACK block: SB.RcvTS
        (32bit-timestamp).  It holds the RSEG.TSval value on the segment
        that last updated this SACK block.

      - When a received SACK block other than a D-SACK block satisfies
        (RSEG.TSval > SB.RcvTS), where RSEG.TSval represents the


Demizu                   Expires September 2006                [Page 71]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        RSEG.TSval value on the segment that carried the received SACK
        block, the corresponding existing SACK block SHOULD be
        overwritten by the received SACK block in order to avoid
        possible conflicts.  Otherwise, if (RSEG.TSval == SB.RcvTS) is
        true, the corresponding existing SACK block MAY be expanded by
        the received SACK block.

      - If RSEG.ACK points to the middle of an existing SACK block, the
        start sequence number of the existing SACK block is changed to
        new SND.UNA + 1 SMSS without updating SB.RcvTS.  Then, the data
        at SND.UNA are inferred lost.

   In the Slow Start and Congestion Avoidance phase [RFC2581], when
   (SND.NXT < SND.MAX) is true (i.e., when SND.NXT has been rewound
   because of a retransmission timeout), SND.NXT SHOULD skip the SACKed
   data so as not to retransmit it.  In addition, skipped SACKed data
   SHOULD NOT be calculated as part of the flight size.

   If ABC [RFC3465] is enabled, then when an ACK segment is received,
   the number of octets acknowledged by the ACK segment needs to be
   calculated.  In this calculation, already SACKed data SHOULD be
   omitted.  Since the SACK information may not be fully synchronized
   with the data receiver, the number of octets acknowledged by each ACK
   segment SHOULD NOT exceed some upper bound (e.g., 2 SMSS).

   Note: According to the fourth paragraph of section 2.3 in [RFC3465],
   TCP stacks need to determine whether a TCP connection is "during a
   slow start phase that follows a retransmission timeout".  This
   appendix recommends that (SND.NXT < SND.MAX) be used to determine
   this.

C.4 SACK-LF (SACK Lowest First)

   A data receiver SHOULD inform its data sender of appropriate SACK
   information so that the sender can recover lost data effectively.

   A data receiver maintains a queue of SACK blocks to be sent in the
   TCP SACK option to the data sender.  To comply with section 4 of
   [RFC2018], when a SACK block is updated, it is typically moved to the
   head of the queue.  As a result, the most recently updated SACK
   blocks are informed to the data sender using the TCP SACK option.

   Suppose that some data segments are lost within an RTT.  In this
   case, a data receiver typically receives the out-of-order data
   segments in ascending order.  Therefore, SACK blocks sent in reply
   within the same RTT (or the first RTT) are typically sorted in
   descending order.  In contrast, within the next RTT (or the second
   RTT), if the data receiver receives all the lost data, the same SACK
   blocks (which would be the highest SACK blocks) on the last ACK


Demizu                   Expires September 2006                [Page 72]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   segment within the first RTT, excluding cumulatively acknowledged
   SACK blocks, are sent in reply, while RSEG.ACK is gradually
   advancing.  In general, by considering the possibility that some
   retransmitted data segments are lost, the most recently updated SACK
   blocks (which would be located far from SND.NXT) will be sent in
   reply within the second or later RTTs, while the data sender would
   want to confirm the SACK blocks just above SND.NXT.

   In the current standard TCP, whenever a retransmitted data segment is
   lost, a retransmission timeout is triggered in order to re-retransmit
   the lost data.  According to section 5.1 of [RFC2018], "When a
   retransmit timeout occurs the data sender MUST ignore prior SACK
   information in determining which data to retransmit".  Thus, for the
   same reason discussed in the previous paragraph, the data receiver
   keeps sending the same SACK blocks, which likely would be the highest
   SACK blocks.  As a result, the data sender will retransmit all data
   between SND.UNA and the lowest reported SACK block.  This
   retransmitted data will include data that was SACKed before the
   retransmission timeout.  That is, bandwidth might be wasted if the
   data sender complies with section 5.1 of [RFC2018].

   To mitigate this problem, this subsection proposes SACK-LF, as
   follows:

   When RCV.NXT is advanced at a data receiver, a certain number of the
   lowest SACK blocks are moved to the head of the queue.  The number of
   SACK blocks to be moved is chosen so that all SACK blocks are sent
   the same number of times, so as to make the SACK information robust
   against losses of ACK segments.

   This memo proposes that the number of SACK blocks to be moved to the
   head of the queue be the sum of the following two numbers plus one
   (i.e., num_removed + num_lowest + 1).

      - num_removed:

        The number of SACK blocks that were sent in the previous TCP
        SACK option and are removed by the received RSEG.ACK.

      - num_lowest:

        The number of SACK blocks that were sent in the previous TCP
        SACK option and are in the current lowest N SACK blocks, where N
        is the number of SACK blocks sent in the previous TCP SACK
        option.

   If a data sender discards the SACK scoreboard upon a retransmission
   timeout, SACK-LF that is performed at a data receiver will mitigate
   the number of unnecessary retransmissions.  If D-SACK is not


Demizu                   Expires September 2006                [Page 73]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   supported by the data sender, SACK-LF will also mitigate the number
   of spurious Fast Retransmits.  If the SACK information has not been
   fully synchronized with the data receiver, SACK-LF will suppress
   unnecessary retransmissions.

   In addition to SACK-LF, this subsection proposes the following:

      - If a data receiver discards part of an out-of-order consecutive
        data block that has been informed to the data sender by using
        the TCP SACK option, the shrunken SACK block SHOULD be moved to
        the head of the queue in order to inform of the change.

      - When a data receiver receives a data segment, if it discards
        part or all of the data, the SACK blocks on the segment sent in
        reply SHOULD NOT include the discarded part of the data.  Note
        that section 8 of [RFC2018] says "MUST" instead of "SHOULD NOT".



































Demizu                   Expires September 2006                [Page 74]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix D: Summary of TCP Timestamps Option in RFC1323

   The TCP Timestamps option [RFC1323] is currently deployed widely.
   There is also a variant of the TCP Timestamps option, which probably
   is more prevalent than the option described in [RFC1323].  The
   variant is called "rfc1323bis" [JBB03] (see also [Bra93] and [JBB97])
   in this appendix.  For simplicity, the TCP Timestamps option is
   called "the TS option" here.

   This appendix describes the behaviors of the TCP Timestamps option
   specified in RFC1323 and rfc1323bis, by using C-like pseudocode.
   Some definitions are borrowed from the TS2 reference given in
   appendix A.

D.1 Types

   The following types are borrowed from the TS2 reference: boolean,
   integer, 32bit-sequence-number, 32bit-timestamp, and internal-time.

D.2 Functions

   The following functions are borrowed from the TS2 reference:
   SEQ_LE(), SEQ_LT(), TS_LT(), TS_LE(), and TIME_LT().

D.3 Inequalities

   The following inequalities are defined.

      - Inequality (A) ... RFC1323

        (SEQ_LE(RSEG.SEQ, Last.ACK.sent) &&
         SEQ_LT(Last.ACK.sent, RSEG.SEQ + RSEG.LEN))

      - Inequality (B) ... rfc1323bis

        SEQ_LE(RSEG.SEQ, Last.ACK.sent)

        Note: (RSEG.TSval >= TS.Recent) is omitted in this inequality
        because it is part of the PAWS test.

   Only one of (A) or (B) SHOULD be implemented.  A boolean function
   called TS_ISLEG() returns true if the selected inequality is
   satisfied.  Otherwise, it returns false.

   Note: In addition to the inequalities given above, this memo
   recommends that (Last.Ack.Sent - max(RCV.WND) <= RSEG.SEQ) also be
   checked, in addition to (A) or (B).




Demizu                   Expires September 2006                [Page 75]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


D.4 Variables

   The following variables are defined in [RFC1323].

      TS.Recent (32bit-timestamp)
         - This variable records the maximum RSEG.TSval value on the
           received segments satisfying TS_ISLEG().  It is echoed in
           SSEG.TSecr.

      Last.Ack.Sent (32bit-sequence-number)
         - This variable holds the last SSEG.ACK value sent.

   The following variables are defined here to describe the behaviors.

      TS.Req (boolean)
         - This variable represents a user's request:
             True if the TCP Timestamps option is requested.
             False, otherwise.
         - The initial value is given by the user.

      TS.OK (boolean)
         - This variable is true if the TS option is enabled.  The
           initial value is false.  It is set to true if the TS option
           is exchanged on SYN and SYN+ACK segments in the TCP three-way
           handshake phase.

      TS.Recent_time (internal-time)
         - This variable holds the time when TS.Recent was last updated.

D.5 Current Time

   The following pseudocode functions are defined here for getting the
   current time or current timestamp.

      GetTime() --- Get Current Time (internal-time)
         - Get the current time in an internal time format.
         - The time returned by this function MUST NOT be wrapped in the
           lifetime of any TCP connections.

      GetTS() --- Get Current Timestamp (32bit-timestamp)
         - Get the current time in a 32bit unsigned integer so that it
           can be sent in the TS option.
         - The timestamp unit MUST be in the range of 1 sec to 1 ms.

D.6 Constants

   The following constants are defined here to describe the behaviors.

      TS_GRANULARITY (32bit-timestamp)


Demizu                   Expires September 2006                [Page 76]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


         - This constant represents the granularity of GetTS() in the
           unit of the return value of GetTS().

      TS_PAWS_IDLE (internal-time) for PAWS
         - The value of TS.Recent is valid for TS_PAWS_IDLE if TS.OK is
           true.
         - The default value is 24 days.

D.7 Attributes of Received Segments

   The following flags are borrowed from the TS2 reference:
   isSYN, isRST, isFirstSYN, isFirstSYNACK, and withTS.

D.8 Procedures

D.8.1 Initialization

   When a TCP Control Block is created or reused, the procedure below is
   followed.

        TS.Req = true or false;         /* Requested by user */
        TS.OK = false;

D.8.2 Input Processing

   When a segment is received, the procedure below is followed.

        if (isFirstSYN || isFirstSYNACK) {
                if (TS.Req && withTS) {
                        TS.OK = true;
                        TS.Recent = RSEG.TSval;
                        TS.Recent_time = GetTime();
                }
        } else if (TS.OK) {
                /* (R1) PAWS */
                if (!isSYN && !isRST &&
                    TIME_LT(GetTime() - TS.Recent_time, TS_PAWS_IDLE) &&
                    TS_LT(RSEG.TSval, TS.Recent)) {
                        /* This segment MUST be dropped. */
                        /* An ACK with TS SHOULD be sent. */
                }
                /* (R2) If it is outside the window, reject it. */
                /* (R3) Update TS.Recent */
                if (TS_ISLEG() && TS_LE(TS.Recent, RSEG.TSval)) {
                        TS.Recent = RSEG.TSVal;
                        TS.Recent_time = GetTime();
                }
                /* RTTM: If it advances SND.UNA, do RTTM. */
                Measured_RTT = GetTS() - RSEG.TSecr + TS_GRANULARITY;


Demizu                   Expires September 2006                [Page 77]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


        }

D.8.3 Output Processing

   When a segment is sent, the procedure below is followed.

        if (TS.OK) {
                /* Put the TCP Timestamps option */
                SSEG.TSval = GetTS();
                SSEG.TSecr = TS.Recent;
        }
        LAST.Ack.Sent = SSEG.ACK;







































Demizu                   Expires September 2006                [Page 78]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix E: Issues with TCP Timestamps Option in RFC1323

   This appendix discusses the issues with both the TCP Timestamps
   option in [RFC1323] and rfc1323bis [JBB03].  It also discusses how
   these issues are handled in TS1 and TS2.

E.1 RTTM

   This subsection discusses the issues of RTT measurements.

   Since the RTTMs in RFC1323, rfc1323bis, and TS1 take RTT measurements
   only when SND.UNA is advanced, they cannot take RTT measurements
   during the loss recovery phase, except when partial or full
   acknowledgement is received.  In contrast, RTTM/TS2 can take RTT
   measurements whenever it receives the TS option, even when SND.UNA is
   not advanced.

   When a remote node is compliant with RFC1323, RTTM overestimates RTTs
   in the following scenario.

      Assume that all data segments sent within an RTT arrive at the
      remote node but all ACK segments sent in reply are lost.  Upon a
      retransmission timeout, the lowest lost data is retransmitted, and
      an ACK segment sent in reply is received.

      In this case, the TSecr field on the received ACK segment has the
      TSval value on the last original data segment that arrived at the
      remote node.  Therefore, RTTM at the local node measures the time
      from when the last original data segment was sent until when an
      ACK segment sent in reply to the retransmitted data segment is
      received.  Thus, the measured RTT is much longer than the real RTT
      and nearly equal to the RTO value [Duk03a].

      In contrast, if the remote node complies with RTTM in rfc1323bis,
      RTTM/TS1, or RTTM/TS2, then the received ACK segment carries the
      TSval value on the retransmitted data segment.  Therefore, RTTM at
      the local node takes a correct RTT measurement, because it
      measures the time from when the lowest lost data is retransmitted
      until when its ACK segment is received.

   When a remote node is compliant with RFC1323, rfc1323bis or TS1, RTTM
   overestimates RTTs in the following scenario.

      Assume that the local node sends a data segment but an ACK segment
      sent in reply is lost.  Before the retransmission timeout at the
      local node, the remote node sends a data segment, which
      acknowledges the data sent by the local node.

      In this case, the TSecr field on the received data segment has the


Demizu                   Expires September 2006                [Page 79]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      TSval value on the data segment sent by the local node.  Since
      SND.UNA at the local node is advanced by the received data
      segment, RTTM at the local node measures the time from when the
      data segment was sent by the local node until when the data
      segment is received.  Thus, the measured RTT is longer than the
      real RTT [Duk03b].

      In contrast, RTTM/TS2 will not take an RTT measurement because the
      data segment sent by the remote node carries the OTS option.

E.2 PAWS and Reordering

   As described in appendix F, there is a possibility that a legitimate
   data segment could be discarded by PAWSs in RFC1323 and rfc1323bis
   when it is delayed because of reordering.

   In addition, there is a possibility that a legitimate ACK segment in
   a unidirectional data flow could be discarded by PAWS in rfc1323bis
   when it is delayed because of reordering [Mil98].

   In contrast, PAWS/TS1 is slightly more robust against reordering than
   PAWS in RFC1323 and rfc1323bis, because of TS1_PAWS_MARGIN.  PAWS/TS2
   is robust against reordering, and legitimate segments are unlikely to
   be discarded even when they are delayed because of reordering.

   Note: Linux seems to comply with RFC1323, instead of rfc1323bis, and
   it appears to have implemented measures including the same idea as
   TS1_PAWS_MARGIN.

E.3 Spoofed Segment Detection

   [PD04] proposes to detect spoofed segments by making use of the TSecr
   field.  To achieve this goal, when an ACK segment is sent, its TSval
   value is the same value as the TSval value on the last data segment.
   Unfortunately, this mechanism makes it impossible to apply PAWS for
   ACK segments.  In addition, there could be other unknown problems.

   In contrast, PASA/TS2 detects spoofed segments without tweaking the
   TSval values.  Thus, it does not have such problems.

E.4 Retransmitted Data Loss Inference

   It has been said that if the TSval values on out-of-order data
   segments were echoed by a data receiver, the data sender would be
   able to infer losses of retransmitted data segments.  The TCP
   Timestamps options in RFC1323, rfc1323bis, and TS1 cannot infer such
   losses.

   In contrast, TS2 enables DLI/TS2 to infer losses of both


Demizu                   Expires September 2006                [Page 80]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   retransmitted data segments and original data segments.

E.5 Corner Case of Eifel

   According to section 3.3 of [RFC3522], if a remote node supports the
   TCP Timestamps option in RFC1323 and does not support D-SACK
   [RFC2883], then when all ACK segments within an RTT are lost, the
   Eifel Detection Algorithm [RFC3522] will misinterpret the consequent
   retransmission timeout as a spurious timeout.

   In contrast, if a remote node supports the TCP Timestamps option in
   rfc1323bis or TS1, there is no such problem.

E.6 Vulnerability

   If an implementation that complies with rfc1323bis overwrites
   TS.Recent with RSEG.TSval whenever it receives a segment satisfying
   (RSEG.TSval >= TS.Recent && RSEG.SEQ <= Last.ACK.sent), it has a
   vulnerability [CVE05][CERT05].

   In contrast, implementations complying with RFC1323, TS1, and TS2 do
   not have such a vulnerability when the window size is not very large.
   If TS2 is enabled, PASA/TS2 combined with PAWS/TS2 will detect
   spoofed segments even when the window size is very large.

E.7 Summary

   The table below summarizes the issues discussed in this appendix.

    +----------------------------+---------+------------+------+-----+
    |                            | RFC1323 | rfc1323bis | TS1  | TS2 |
    +----------------------------+---------+------------+------+-----+
    | RTTM: Dup-ACKs             | NG      | NG         | NG   | OK  |
    | RTTM: Overestimation       | NG      | Fair       | Fair | OK  |
    | PAWS: Reordering           | NG      | NG         | Fair | OK  |
    | PASA: PAWS for ACKs        | NG      | NG         | NG   | OK  |
    | DLI: Retransmitted Data    | NG      | NG         | NG   | OK  |
    | Eifel: A Corner Case       | NG      | OK         | OK   | OK  |
    | Vulnerability              | Fair    | NG         | Fair | OK  |
    +----------------------------+---------+------------+------+-----+

          Table E-1: Summary of Issues with TCP Timestamps Option









Demizu                   Expires September 2006                [Page 81]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix F: Problem of PAWS in RFC1323 and Reordering

   There is a possibility that legitimate data segments could be
   discarded by PAWS in [RFC1323] when those segments are delayed
   because of reordering.  This appendix shows some examples of this
   problem, and describes a generic scenario and possible negative
   effects, then proposes a possible solution.

F.1 Example 1: Reordering and Fast Retransmit with Limited Transmit

   In this example, suppose that TCP A is sending data to TCP B.  Assume
   that TCP A supports the TCP Timestamps option in [RFC1323], TCP
   Congestion Control [RFC2581], and Limited Transmit [RFC3042], and
   that TCP B supports the TCP Timestamps option with PAWS in [RFC1323].

   Suppose that the data segment sequence W.1, X.2, Y.3, Z.4, S.5 is
   sent by TCP A, where the letter indicates the sequence number and the
   digit represents the timestamp in the TSval field.  In this data
   segment sequence, suppose that W.1 and X.2 are sent in the Congestion
   Avoidance phase, Y.3 and Z.4 are sent by Limited Transmit, and S.5 is
   sent by Fast Retransmit.

   Figure F-1 illustrates the data segment sequence observed at TCP A.
   The x-axis represents time, and the y-axis represents the sequence
   number.  W.1 through Z.4 and S.5 indicate the data segments sent.
   Each 'o' mark indicates a received ACK segment.  Lines are drawn to
   connect the symbols between data segments and between ACK segments.

            Sequence number
                  A                      Z.4
                  |                 Y.3~~  \
                  |            X.2~~        \
                  |       W.1~~              \
                  |     ~~                    \
                  |                           S.5
                  |             o____o____o____o
                  |        o~~~~     1    2    3!!   <-- dup-ACK count
                  |   o~~~~
                  +--------------------------------> Time

               Figure F-1: Time vs. sequence number at TCP A

   Now, suppose that the data segment sequence W.1, X.2, Y.3, Z.4, S.5
   sent by TCP A is reordered as W.1, X.2, Y.3, S.5, Z.4 (i.e., Z.4 and
   S.5 are exchanged) on the path to TCP B.  Figure F-2 illustrates the
   resulting data segment sequence observed at TCP B.

   What happens at TCP B is described below.



Demizu                   Expires September 2006                [Page 82]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      0. Assume TS.Recent is valid and TS.Recent == 0.
         Assume RCV.NXT == S.

      1.  W.1 is received.  PAWS accepts it because TS.Recent < 1.
          TS.Recent is not updated because RCV.NXT < W.

      2.  X.2 is received.  PAWS accepts it because TS.Recent < 2.
          TS.Recent is not updated because RCV.NXT < X.

      3.  Y.3 is received.  PAWS accepts it because TS.Recent < 3.
          TS.Recent is not updated because RCV.NXT < Y.

      4.  S.5 is received.  PAWS accepts it because TS.Recent < 5.
          TS.Recent is updated because RCV.NXT == S and S.5 has data.

          Now, TS.Recent == 5 and RCV.NXT >= S + the data length of S.5.
          (The actual new value of RCV.NXT depends on the out-of-order
          data queue in TCP B.)

      5.  Z.4 is received.  PAWS discards it because TS.Recent > 4.

   In this example, the legitimate segment Z.4 is discarded by PAWS in
   step 5.  Figure F-2 illustrates this scenario.

            Sequence number
                  A                           Z.4
                  |                 Y.3       /
                  |            X.2~~   \     /
                  |       W.1~~         \   /
                  |     ~~               \ /
                  |                      S.5
                  |
                  +--------------------------------> Time

        +---------+-------------------------------+
        |Segment  |(prev) W.1  X.2  Y.3  S.5  Z.4 |
        +---------+-------------------------------+
        |PAWS     |   -   Pass Pass Pass Pass Fail|
        |TS.Recent|   0    0    0    0    5    5  |
        |RCV.NXT  |   S    S    S    S    >S   >S |
        +---------+-------------------------------+

               Figure F-2: Time vs. sequence number at TCP B

   Even in the case where TCP A does not support Limited Transmit (i.e.,
   the case where Y.3 and Z.4 are not sent in the example above), if the
   data segment sequence W.1, X.2, S.5 sent by TCP A is reordered as
   W.1, S.5, X.2 (i.e., X.2 and S.5 are exchanged) on the path to TCP B,
   X.2 could be discarded by PAWS.  Since there would be a small gap


Demizu                   Expires September 2006                [Page 83]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   between the time when X.2 is sent and the time when S.5 is sent, the
   possibility of this problem occurring would be less than in the
   example above.

F.2 Example 2: Reordering and NewReno

   In this example, suppose that TCP A is sending data to TCP B.  Assume
   that TCP A supports the TCP Timestamps option in [RFC1323], TCP
   Congestion Control [RFC2581], and NewReno [RFC3782], and that TCP B
   supports the TCP Timestamps option with PAWS in [RFC1323].

   Suppose that the data segment sequence W.1, X.2, Y.3, Z.4, S.5 is
   sent by TCP A, where the letter indicates the sequence number and the
   digit represents the timestamp in the TSval field.  In the data
   segment sequence, suppose that W.1 through Z.4 are sent by Fast
   Recovery at each time when a duplicate ACK segment is received, and
   that S.5 is sent by NewReno.

   Figure F-3 illustrates the data segment sequence observed at TCP A.
   This figure uses the same notation that in Figure F-1.

            Sequence number
                  A                    Z.4
                  |               Y.3~~  \
                  |          X.2~~        \
                  |     W.1~~              \
                  |   ~~                    \
                  |                         S.5
                  |                          o
                  |                         /
                  |                        /
                  |                       /
                  |                      /
                  |    ..o____o____o____o
                  |
                  +--------------------------------> Time

               Figure F-3: Time vs. sequence number at TCP A

   Now, suppose that the data segment sequence W.1, X.2, Y.3, Z.4, S.5
   sent by TCP A is reordered as W.1, X.2, Y.3, S.5, Z.4 (i.e., Z.4 and
   S.5 are exchanged) on the path to TCP B.

   The resulting data segment sequence observed at TCP B is the same as
   that shown in Figure F-2.  What happens at TCP B is also the same as
   in Example 1 above.  Consequently, the legitimate segment Z.4 is
   discarded by PAWS.




Demizu                   Expires September 2006                [Page 84]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


F.3 Generic Scenario

   In general, this problem occurs in the following scenario.

   Suppose that TCP A is sending data to TCP B, and data segments Z.4
   and S.5 are sent by TCP A, where the letter indicates the sequence
   number and the digit represents the timestamp in the TSval field.

      1.  Data segment Z.4 is sent by the sender (TCP A).

      2.  Data segment S.5 is sent by the sender (TCP A).

          Note: Segment S.5 would be a retransmitted segment sent by
                Fast Retransmit, NewReno, SACK [RFC2018][RFC3517], or
                another mechanism that infers a segment loss and
                retransmits the lost data quickly.  The sequence number
                of segment S.5 would be less than SND.NXT.

      3.  Segment S.5 arrives at the receiver earlier than segment Z.4.

          Suppose that segment S.5 satisfies (RSEG.SEQ <= RCV.NXT <
          RSEG.SEQ + RSEG.LEN), and that the TSval value on segment S.5
          is not older than the TS.Recent value at the receiver (TCB B).

          Segment S.5 is accepted by PAWS at the receiver.  TS.Recent at
          the receiver is updated with the TSval value on segment S.5
          (i.e., TS.Recent = 5).  RCV.NXT is also updated.

      4.  Segment Z.4 arrives at the receiver (TCP B).

          Segment Z.4 is discarded by PAWS because the TSval value (= 4)
          on segment Z.4 is older than the TS.Recent value (= 5) at the
          receiver.

   In this scenario, the gap between the time when segment Z.4 is sent
   and the time when segment S.5 is sent should be small, so that
   reordering could exchange segments Z.4 and S.5.

F.4 Negative effects

   This problem would cause some negative effects on TCP performance.

   A data sender would spend additional time detecting a loss and
   recovering from it.  Moreover, the sender would consider the loss to
   be a congestion indication, and the congestion window would
   needlessly be further reduced.

   In addition, discarding legitimate segments at a data receiver is a
   waste of bandwidth.


Demizu                   Expires September 2006                [Page 85]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


F.5 Possible Solution

   A straightforward way to solve this problem would be to modify the
   rules of PAWS so that valid delayed segments are accepted.

   The new rule would be as follows:

   - Change the inequality in R1) in section 4.2.1 of [RFC1323] as shown
     below:

         Current:  RSEG.TSval < TS.Recent
         Proposal: RSEG.TSval < TS.Recent - T1, where T1 = RTO value.

   - In addition, to keep TS.Recent be monotonically nondecreasing, in
     R3) in section 4.2.1 of [RFC1323], TS.Recent should be updated only
     when RSEG.TSval >= TS.Recent.

   With this new rule, it would be very important to choose the value of
   T1 appropriately.  This would be difficult for a data receiver,
   however, because it does not know the unit of the TSval values on the
   received segments.






























Demizu                   Expires September 2006                [Page 86]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix G: Alternative Ideas

G.1 TCP Feature Array Option

   Since the purpose of the OTS_OK option (i.e., the OTS option with
   option-length=2) is to negotiate the enabling of a feature, it could
   be replaced with a bit in something like a "binary option negotiation
   option" [All04].  The format would be like the following:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Kind = <TBD>  |  Length = 3   |     flags     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure B-1: The TCP Feature Array option

   This idea is not employed because it requires additional TCP option
   space (i.e., at least 3 octets) and a new option-kind value.

   Nevertheless, this new 3-octet option can be carried on a SYN segment
   even in the following combination (40 octets total).

      -  4 octets: TCP MSS option [RFC793]
      -  3 octets: TCP Feature Array option
      -  3 octets: TCP Window Scale option [RFC1323]
      - 10 octets: TCP Timestamps option [RFC1323]
      -  2 octets: TCP SACK-PERMITTED option [RFC2018]
      - 18 octets: TCP MD5 Signature option [RFC2385]

   Normally, for alignment at a 32-bit boundary, one NOP is put after
   the TCP Window Scale option, and two NOPs are put before the TCP
   Timestamps option, as described in appendix A of [RFC1323].  If these
   three NOPs are removed, the TCP Feature Array option can be inserted
   as above without breaking the 32-bit timestamps alignment.

G.2 Timestamp Unit

   In this memo, the timestamp unit for TS2 is fixed at 1 usec (10^-6).
   This value is advantageous for inferring losses of data and detecting
   spurious loss inference quickly, especially in highspeed networks,
   and taking finer RTT measurements in LAN environments.  In addition,
   some lower bits of timestamps can be used as nonce to obfuscate
   timestamps.

   An alternative idea would be to fix the unit at 1 ms (10^-3).  Since
   [RFC1323] specifies that the unit is in the range of 1 second to 1
   ms, the unit of 1 ms is interoperable with [RFC1323].  In addition,
   if the timestamp unit for TS2 is changed to 1 ms, PASA-DF/TS2 would


Demizu                   Expires September 2006                [Page 87]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


   become more powerful, because the difference between TS.SndMax and
   TS.SndMin becomes thousand times smaller.  This idea is not employed
   here, however, in order to obtain the advantages given in the
   previous paragraph.

      Note: If the timestamp unit is changed to 1 ms, the variable
      TS.SndAdj can be removed.  The default value of TS2_PAWS_IDLE is
      changed from 20 minutes to 24 days.  TS.PASADF_On of PASA-DF/TS2
      can be removed.

   Another alternative idea would be to negotiate the timestamp unit by
   using SYN segments within the range between, e.g., 1 sec and 1 nsec.
   In this case, TS2_PAWS_IDLE should be replaced with a variable.  This
   idea is not employed here because such negotiation would not be
   simple and would require additional TCP option space.

G.3 TS option without TSecr field

    Since the value in the TSecr field in the OTS option may be very old
    and useless, an alternative idea would be to replace the OTS option
    (of option-length=10) with the TS option without the TSecr field
    (i.e., option-length=6) [Duk03b].

    This idea is not employed here because the TSecr field in the OTS
    option is referred to by PASA-DF/TS2 and SLID/TS2.  In addition,
    some new mechanisms might use this field in the future.

G.4 Eifel Detection Algorithm and TS2

   This subsection shows the reason why this memo proposes not to apply
   the Eifel Detection Algorithm [RFC3522], which detects a posteriori
   spurious retransmissions by making use of the TCP Timestamps option
   [RFC1322], to TS2 by illustrating a case where the Eifel Detection
   Algorithm is not robust against reordering with TS2.

   Suppose that TCP A is sending certain amount of data to TCP B, where
   TS2 is enabled on the TCP connection between them, and TCP A supports
   the Eifel Detection Algorithm.  Assume that TCP A now sends two data
   segments: an original data segment S.1 and a retransmitted data
   segment R.2, where the letter indicates the sequence number and the
   digit represents the timestamp in the TSval field.  The following
   scenario shows the problem.

      1. First, TCP A sends an original data segment S.1 to TCP B.

      2. Then, TCP A sends a retransmitted data segment R.2 to TCP B.
         The TSval value on data segment R.2 is recorded in RetransmitTS
         in [RFC3522].  Assume that this retransmission is genuine.
         Note that the sequence number R.2 is less than S.1.


Demizu                   Expires September 2006                [Page 88]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


      3. Assume that the data segments sequence S.1, R.2 sent by TCP A
         is reordered as R.2, S.1 (i.e., they are exchanged) on the path
         to TCP B.

      4. First, TCP B receives retransmitted data segment R.2.  TCP B
         sends an ACK segment with TSecr=2 sent in reply to R.2.  Assume
         that this ACK segment is lost.

      5. Then, TCP B receives original data segment S.1.  TCP B sends an
         ACK segment with TSecr=1 sent in reply to S.1.

      6. TCP A receives an ACK segment with TSecr=1 only.  Since the
         received TSecr value (=1) is less than RetransmitTS (=2) in
         [RFC3522], TCP A falsely infers the retransmission of data
         segment R.2 spurious.

   As illustrated in this scenario, when TS2 is enabled, the Eifel
   Detection Algorithm is not robust against the combination of the
   reordering of data segments and the losses of ACK segments.  The
   cause of this problem is that the Eifel Detection Algorithm assumes
   that TS.Recent is monotonically nondecreasing, while TS.Recent with
   TS2 is not monotonically nondecreasing (See section 4.4).

   To solve this problem, SLID/TS2 is proposed.  It compares RSEG.TSecr
   with the border timestamp calculated by TS2_SLID_BTS() specified in
   section 9.1, instead of comparing with either target timestamp or
   probe timestamp.  With the current definition of TS2_SLID_BTS(),
   SLID/TS2 is robust against reordering if delays are less than RTT/2.
   Unfortunately, the permissible delays of SLID-SACK/TS2 may be less
   than RTT/TS2 depending on SACK hole sizes.





















Demizu                   Expires September 2006                [Page 89]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Appendix H: Changes from -00 version.

   - Base Mechanism
      - Former sections 4.3 and 4.4 are renumbered to 4.4 and 4.5 in
        order to introduce new section 4.3 "Internal Timestamp and
        External Timestamp".

   - RTTM
      - TS1_RTTM_G is renamed to TS1_GRANULARITY.
      - TS2_RTTM_G is renamed to TS2_GRANULARITY.
      - TS_RTTM_G is renamed to TS_GRANULARITY.

   - PAWS
      - Section 6.2 is split into 6.2.1 and 6.2.2.

   - PASA
      - In section 7.1.3 and appendix A.10.2.1: When a SYN+ACK segment
        is received in the SYN-SENT state, RSEG.TSecr is not tested now.
      - TS2_PASADF_RNDMAX_REUSE is introduced.
      - TS2_PASADF_RNDMAX_IDLE is introduced.

   - DLI
      - DLD (Data Loss Detection) is renamed to DLI (Data Loss
        Inference).
      - Sections 8.1 to 8.3 are renumbered to 8.2.1 to 8.2.3.
        Sections 8.1 and 8.2 are added.
        Sections for DLI-UNA/TS2 and DLI-SACK/TS2 are exchanged.
        (Sections 8.2.2 and 8.2.3 are exchanged, and
         appendices A.5.4.2 and A.5.4.3 are exchanged.)
      - DS.Start and DS.End are added to appendix A.5.4.1.
      - TS.SndUnaTS is renamed to TS.UNA.SndTS.
      - TS.SndUnaRO is renamed to TS.UNA.SndRO.
      - Section 8.2.4 "DLI-NXT/TS2" is added.
        Appendix A.5.4.4 is also added.
      - Section 8.2.5 "DLI-MAX/TS2" is added.
        Appendix A.5.4.5 is also added.

   - SLID
      - SRD (Spurious Retransmission Detection) is replaced with SLID
        (Spurious Loss Inference Detection).

   - Appendices
      - isSYNACK is introduced in appendix A.
      - Former Appendices A.1 to A.10 are renumbered to A.2 to A.11 in
        order to introduce new appendix A.1 "TCP Options".
      - Former appendix B is moved to appendix G in order to introduce
        new appendix B "Granularity of Timestamps".
      - Appendix H "Changes from -00 version" is added.



Demizu                   Expires September 2006                [Page 90]


Internet-Draft        <draft-demizu-tcp-ts2-01.txt>           March 2006


Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed
   to pertain to the implementation or use of the technology
   described in this document or the extent to which any license
   under such rights might or might not be available; nor does it
   represent that it has made any independent effort to identify any
   such rights.  Information on the procedures with respect to
   rights in RFC documents can be found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use
   of such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository
   at http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention
   any copyrights, patents or patent applications, or other
   proprietary rights that may cover technology that may be required
   to implement this standard.  Please address the information to the
   IETF at ietf-ipr@ietf.org.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.









Demizu                   Expires September 2006                [Page 91]