TCP Maintenance and Minor                          R. Scheffenegger, Ed.
Extensions (tcpm)                                           NetApp, Inc.
Internet-Draft                                             M. Kuehlewind
Intended status: Experimental                    University of Stuttgart
Expires: September 15, 2011                               March 14, 2011


        Additional negotiation in the TCP Timestamp Option field
                        during the TCP handshake
           draft-scheffenegger-tcpm-timestamp-negotiation-01

Abstract

   RFC 1323 defines the TSecr field of a SYN packet to be not valid and
   thus this field will always be zero.  This documents specifies the
   use of this field to signal and negotiate additional information
   about the content of the TSopt field as well as the behavior of the
   receiver.  If the receiver understands this extension, it will use
   the TSecr field of the SYN/ACK to reply.  Otherwise the receiver will
   ignore the TSecr field and set a timestamp in the TSecr field as
   specified in RFC 1323.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 15, 2011.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents



Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 1]


Internet-Draft            Timestamp Negotiation               March 2011


   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
   2.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
   4.  Signaling  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     4.1.  Capability Flags . . . . . . . . . . . . . . . . . . . . .  5
     4.2.  Implicit extended negotiation  . . . . . . . . . . . . . .  7
   5.  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     9.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
     9.2.  Informative References . . . . . . . . . . . . . . . . . .  9
   Appendix A.  Optional Capability Flags . . . . . . . . . . . . . . 10
     A.1.  Range Negotiation  . . . . . . . . . . . . . . . . . . . . 11
   Appendix B.  Revision history  . . . . . . . . . . . . . . . . . . 12
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12
























Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 2]


Internet-Draft            Timestamp Negotiation               March 2011


1.  Introduction

   The TCP Timestamps Option (TSopt) provides timestamp echoing for
   Round-trip Time (RTT) measurments.  TSopt is widely deployed and
   activated by default in many systems.  RFC 1323 [RFC1323] specifies
   TSopt the following way:

         Kind: 8

         Length: 10 bytes

         +-------+-------+---------------------+---------------------+
         |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
         +-------+-------+---------------------+---------------------+
             1       1              4                     4

                               RFC1323 TSopt

      "The Timestamps option carries two four-byte timestamp fields.
      The Timestamp Value field (TSval) contains the current value of
      the timestamp clock of the TCP sending the option.

      The Timestamp Echo Reply field (TSecr) is only valid if the ACK
      bit is set in the TCP header; if it is valid, it echos a timestamp
      value that was sent by the remote TCP in the TSval field of a
      Timestamps option.  When TSecr is not valid, its value must be
      zero.  The TSecr value will generally be from the most recent
      Timestamp option that was received; however, there are exceptions
      that are explained below.

      A TCP may send the Timestamps option (TSopt) in an initial SYN
      segment (i.e., segment containing a SYN bit and no ACK bit), and
      may send a TSopt in other segments only if it received a TSopt in
      the initial SYN segment for the connection."

   The comparison of the timestamp in the TSecr field to the current
   time gives an estimation of the RTT.  RFC 1323 [RFC1323] specifies
   various cases when more than one timestamp is available to echo.  The
   proposed solution might not always be the best choice, e.g. when the
   TCP Selective Acknowledgment Option (SACK) is used.  Moreover, more
   and more use cases arise where one-way delay (OWD) measurements are
   needed.  These mechanism misuse usually the TSopt to estimated the
   variation in OWD.  To enable such mechanisms the TSecr field in the
   TCP SYN packet could be used for additional negotiation.







Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 3]


Internet-Draft            Timestamp Negotiation               March 2011


1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].














































Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 4]


Internet-Draft            Timestamp Negotiation               March 2011


2.  Overview

   Enhancements in the area of TCP congestion control can use the
   measurement of the one-way delay variance as one input.  However,
   without explicit knowledge of the partner's timestamp clock, arriving
   at a good estimate requires multiple segent exchanges over a few
   round trip times.  Nevertheless, any such calculation has to make
   assumptions about the network state at the time of the measurement.
   In order to assist such algorithms, explicit knowledge at a early
   phase of the session can be negotiated.

   In addition, by using synergistic signalling between Timestamps and
   other options such as selective acknowledgment, enhancments in loss
   recovery are possible by removing any retransmission ambiguity .
   However, currently receivers are required to only reflect the
   timestamp of the last segment that was received in order.  Therefore,
   a backwards compatible way of changing this behavior is required.

   Furthermore, as the importance of the timestamp option increases by
   using it in more aspects of a TCP senders algorithm, so increases the
   importance of maintaining the integrity of the reflected timestamps,
   while allowing the receiver to make use of a senders timestamp.

   As an optional extension, a timestamp clock rate range negotiation is
   also introduced.  However, this is only included as example of
   further possible enhancements.


3.  Definitions

   The reader is expected to be familiar with the definitions given in
   [RFC1323].


4.  Signaling

   During the initial TCP three-way handshake, timestamp options are
   negotiated using the TSecr field.  A compliant TCP receiver will XOR
   the flags with the received TSval, when responding with the SYN+ACK.
   Timestamp Options MAY only be present when the SYN bit is set.

4.1.  Capability Flags

   In order to signal the supported capabilities, the TSecr is
   overloaded with the following flags and fields during the three-way
   handshake.  If optional capabilities such as tcp clock range are
   presented, minimal state will be required in the host to decode the
   returned Flags xor'ed with the TSval.



Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 5]


Internet-Draft            Timestamp Negotiation               March 2011


        Kind: 8

        Length: 10 bytes

        +-------+-------+---------------------+---------------------+
        |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
        +-------+-------+---------------------+---------------------+
            1       1              4          |           4         |
                                             /                      |
        .-----------------------------------'                       |
       /                                                             \
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|R|R|         |               |S|         |                   |
      |X|E|E|  MASK   |      RES      |G|  EXP16  |      FRAC16       |
      |O|S|S|         |               |N|         |                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                          timestamp option flags

   EXO - Extended Options
       Indicated that the sender supports extended timestamp options as
       defined by this document, and MUST be set ("1") by compliant
       implementations.

   RES - Reserved
       Reserved for future use.  MUST not be set ("0").  If a timestamp
       option is received with this bit set, the receiver MUST ignore
       the extended options field and react as if the Flags were not set
       (compatibility mode).

   MASK - Mask Timestamps
       If the timestamp is used for congestion control purposes, an
       incentive exists for malicious receivers to reflect tampered
       timestamps.  A sender MAY choose to protect timestamps from such
       modifications by including a fingerprint (secure hash of some
       kind) in some of the least significant bits.  However, doing so
       would prevent a receiver from using the timestamp for other
       purposes.  The MASK field indicates how many least significant
       bits should be excluded by the receiver, when processing a
       timestamp for timing purposes.  Note that this does not impact
       the reflected timestamp in any way - TSecr will always be equal
       to an appropriate TSval.  Another use case would be when the
       sender does not support a timestamp clock which can guarantee
       unique timestamps for retransmitted segments.  For unambigously
       identifying regular from retransmitted segments, the timestamp
       must be unique for otherwise identical segments.  Reserving the
       least significant bits for this purpose allows senders with slow



Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 6]


Internet-Draft            Timestamp Negotiation               March 2011


       running timestamp clocks to make use of this feature.  Note that
       the use of this option as implications in the protection against
       wraped sequence numbers (PAWS - [RFC1323]), as the more bits are
       set aside for tamper prevention, the faster the timestamp number
       space cycles itself.

   SGN - binary16 Sign
       This is the sign bit of the IEEE 754-2008 binary16 floating point
       representation of the timestamp clock.  Timestamp clocks MUST be
       positive, thus this bit MUST be zero.

   EXP16 - binary16 Exponent
       The exponent component of a binary16 floating point number
       indicating the timestamp clock.  The exponent bias is 28, which
       is not identical to the binary16 definition in IEEE 754-2008.
       Subnormal numbers (lower precision), where the exponent is set to
       zero, extend the lowest possible value representation to 2^-38
       (or 7.276 ps) at reduced precision.  Infinity and NaN (all
       exponent bits set) are not suppored, an exponent value of 31 is
       to be treated as normal exponent.  This allows timestamp clock
       rates of up to 15.999 sec.

   FRAC16 - binary16 Fraction
       The fraction component of a binary16 floating point number
       indicating the timestamp clock.  The clock rate is measured in
       seconds between ticks.  The range with the highest resolution,
       excluding subnormal numbers, covers clock ranges between 7.45 ns
       and 15.99 sec.  It is expected, that timestamp clock rates in
       excess of 0.1 ms are implemented by inserting the timestamp
       "late" before transmitting a segment.

   Example for an extended timestamp option, to indicate that the
   senders timestamp clock (tcp clock) is running with 1 ms per tick:

   SYN, TSopt=<X>, TSecr=EXO|MASK|EXP16=18|FRAC16=0x018

   The clock rate calculates as 2^(18-28)*1.0000011b, thus indicates an
   actual clock rate of 999.45 us

4.2.  Implicit extended negotiation

   If both timestamp extended options and selective acknowledgement
   options ([RFC2018]) are negotiated, both hosts MUST mirror the
   timestamp option immediately after receiving it.  Note that this is
   in conflict with [RFC1323], where only the timestamp of the last
   segment received in sequence is mirrored.  As SACK allows
   discrimination of reordered or lost segments, the reflected
   timestamps are not required to convey the most conservative



Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 7]


Internet-Draft            Timestamp Negotiation               March 2011


   information.  If SACK indicates lost or reordered packets at the
   receiver, the sender MUST take appropriate action such as ignoring
   the received timestamps for calculating the round trip time, or
   assuming a delayed packet (with appropriate handling).  The exact
   implications are beyond the scope of this note.

   This allows the synergistic use of the timestamp option with the SACK
   option to improve loss recovery, round trip time and one way delay
   variance measurements even during loss or reordering episodes.  This
   is enabled by removing any retransmission ambiguity using unique
   timestamps for every retransmission, while simultaneously the SACK
   option indicates the ordering of received segments even in the
   presence of ACK loss or reordering.


5.  Discussion

   One-way delay (variation) based congestion controls would benefit
   from knowing the clock resolution on both sides.

   RTT variance during loss episodes is not deeply researched.  Current
   heuristics (RFC1122, RFC1323, Karn's algorithm, RFC2988) explicitly
   exclude (and prevent) the use of RTT samples when loss occurs.
   However, solving the retransmission ambiguity problem - and the
   related reliable ACK delivery problem - may allow the refinement of
   these algorithms further, as well as enabling new research to
   distinguish between corruption loss (without RTT / one-way delay
   impact) and congestion loss (with RTT / one-way delay impact).
   Research into this field appears to be a rather neglected, especially
   when it comes to large scale, public internet investigations.  Due to
   the very nature of this, passive investigations without signals
   contained within the headers are only of limited use in empirical
   research.

   Retransmission ambiguity detection during loss recovery would allow
   an additional level of loss recovery control without reverting to
   timer-based methods.  As with the deployment of SACK, separating
   "what" to send from "when" to send it could be driven one step
   further.  In particular, less conservative loss recovery schemes
   which do not trade principles of packet conservation against
   timeliness, require a reliable way of prompt and best possible
   feedback from the receiver about any delivered segment and their
   ordering.  SACK alone goes quite a long way, but using Timestamp
   information in addition could remove any ambiguity.  However, the
   current specs in RFC1323 make that use impossible, thus a modified
   signaling (receiver behavior) is a necessity.





Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 8]


Internet-Draft            Timestamp Negotiation               March 2011


6.  Acknowledgements

   The authors would like to thank Dragana Damjanovic for some initial
   thoughts around Timestamps and their extended potential use.


7.  IANA Considerations

   This memo includes no request to IANA.


8.  Security Considerations

   The algorithm presented in this paper shares security considerations
   with [RFC1323].

   Some implementations address the vulerabilities of [RFC1323], by
   dedicating a few low-order bits of the timestamp fields for use with
   a (secure) hash, that protects against malicious tweaking of TSecr
   values.  A Flag-field has been provided to transparently notify the
   receiver about that use of low-order bits, so that they can be
   excluded in one-way delay calculations.


9.  References

9.1.  Normative References

   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
              for High Performance", RFC 1323, May 1992.

   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
              Selective Acknowledgment Options", RFC 2018, October 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

9.2.  Informative References

   [Chirp]    Kuehlewind, M. and B. Briscoe, "Chirping for Congestion
              Control -  Implementation Feasibility", Nov 2010, <http://
              bobbriscoe.net/projects/netsvc_i-f/chirp_pfldnet10.pdf>.

   [I-D.ietf-tcpm-tcp-security]
              Gont, F., "Security Assessment of the Transmission Control
              Protocol (TCP)", draft-ietf-tcpm-tcp-security-02 (work in
              progress), January 2011.




Scheffenegger & Kuehlewind  Expires September 15, 2011          [Page 9]


Internet-Draft            Timestamp Negotiation               March 2011


Appendix A.  Optional Capability Flags

   Certain hosts may want to negotiate a certain optimal Timestamp Clock
   Rate for various purposes.  For example, the balance between PAWS
   ([RFC1323]) and the timestamp clock resolution should be more towards
   one or the other.  Also, if certain algorithms want to have identical
   timestamp clock rates both at the sender and receiver, negotiating
   the clock rate would be preferrable.  However, without a full three
   way handshake, full negotiation of the timestamp clock rate is not
   possible.

   For this purpose, the following extension of this proposal is
   suggested.

        Kind: 8

        Length: 10 bytes

        +-------+-------+---------------------+---------------------+
        |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
        +-------+-------+---------------------+---------------------+
            1       1              4          |           4         |
                                             /                      |
        .-----------------------------------'                       |
       /                                                             \
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|R|R|         |         |             |         |             |
      |X|E|N|  MASK   | EXP12lo |  FRAC12lo   | EXP12hi |  FRAC12hi   |
      |O|S|G|         |         |             |         |             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                          timestamp option flags

   The following additonal fields are defined:

   RNG - Range negotiation
       Indicated that the sender is capable of adjusting the timestamp
       clock rate within the bounds of the two 12 bit fields (see
       Appendix A.1).  This flag is only valid in the initial SYN
       segment, and invalid when the ACK bit is set.

   EXP12lo  and

   EXP12hi - binary12 Exponent
       The exponent component of a truncated, 12 bit floating point
       number indicating the possible timestamp clock ranges.  The
       exponent bias is also 28, and no special numbers (infinity, NaN)



Scheffenegger & Kuehlewind  Expires September 15, 2011         [Page 10]


Internet-Draft            Timestamp Negotiation               March 2011


       are allowed.  The exponent value 31 is treated like any other
       exponent value.  Only the host initiating a TCP session MAY offer
       a timestamp clock range, while the receiver SHOULD select a
       timestamp clock within these bounds.  If the receiver can not
       adjust it's timestamp clock to match the range, it MAY use a
       timestamp clock rate outside these bounds.  If the receiver
       indicated a timestamp clock rate within the indicated bounds, the
       sender MUST set it's timestamp clock rate to the negotiated rate.
       If the receiver uses a timestamp clock rate outside the indicated
       bounds, the sender MUST set the local timestamp clock rate to the
       value indicated at the closer bound.

   FRAC12lo  and

   FRAC12hi - binary12 Fraction
       The fraction component of a 12 bit floating point number.
       Subnormal numbers are allowed (Exponent value 0).  This allows a
       range between 7.45 ns and 15.99 s with full resolution (lower
       bound is 0.06 ns using subnormal values).

A.1.  Range Negotiation

   The following sequence would negotiate the timestamp clock rate for
   both sender and receiver, where both finally know the clock rate of
   the respective partner.

   SYN, TSopt=<X>, TSecr=EXO|RNG|MASK|12bit-lo=1ms|12bit-hi=100ms

   SYN,ACK, TSopt=<Y>, TSecr=<X>^EXO|MASK|16bit=10ms

   In this example, both hosts would run their respective timestamp
   clocks with a resolution of 10 ms.

   SYN, TSopt=<X>, TSecr=EXO|RNG|MASK|12bit-lo=1ms|12bit-hi=100ms

   SYN,ACK, TSopt=<Y>, TSecr=<X>^EXO|MASK|16bit=1000ms

   In this example, the sender would run the timestamp clocks with a
   resolution of 100 ms (closer to the receivers clock rate of 1 sec),
   while the receiver will have a timestamp clock rate running at 1 sec.

   SYN, TSopt=<X>, TSecr=EXO|RNG|MASK|12bit-lo=1ms|12bit-hi=100ms

   SYN,ACK, TSopt=<Y>, TSecr=<X>^EXO|MASK|16bit=100us

   In this example, the sender would run the timestamp clocks with a
   resolution of 10 ms (closer to the receivers clock rate of 0.1 ms),
   while the receiver will have a timestamp clock rate running at 0.1ms.



Scheffenegger & Kuehlewind  Expires September 15, 2011         [Page 11]


Internet-Draft            Timestamp Negotiation               March 2011


Appendix B.  Revision history

   00 ... initial draft, early submission to meet deadline

   01 ... refined draft, focusing only on those options that have an
   immediate use case.  Also excluding flags that can be substituted by
   other means (MIR - synergistic with SACK options only, RNG moved to
   appendix, BIA removed while the exponent bias is at a fixed value.
   Also extended other paragraphs.


Authors' Addresses

   Richard Scheffenegger (editor)
   NetApp, Inc.
   Am Euro Platz 2
   Vienna,   1120
   Austria

   Phone: +43 1 3676811 3146
   Email: rs@netapp.com


   Mirja Kuehlewind
   University of Stuttgart
   Pfaffenwaldring 47
   Stuttgart  70569
   Germany

   Email: mirja.kuehlewind@ikr.uni-stuttgart.de





















Scheffenegger & Kuehlewind  Expires September 15, 2011         [Page 12]