TCP Maintenance & Minor Extensions (tcpm)                     B. Briscoe
Internet-Draft                                Simula Research Laboratory
Intended status: Experimental                              M. Kuehlewind
Expires: December 1, 2017                                     ETH Zurich
                                                        R. Scheffenegger
                                                            May 30, 2017


                   More Accurate ECN Feedback in TCP
                    draft-ietf-tcpm-accurate-ecn-03

Abstract

   Explicit Congestion Notification (ECN) is a mechanism where network
   nodes can mark IP packets instead of dropping them to indicate
   incipient congestion to the end-points.  Receivers with an ECN-
   capable transport protocol feed back this information to the sender.
   ECN is specified for TCP in such a way that only one feedback signal
   can be transmitted per Round-Trip Time (RTT).  Recently, new TCP
   mechanisms like Congestion Exposure (ConEx) or Data Center TCP
   (DCTCP) need more accurate ECN feedback information whenever more
   than one marking is received in one RTT.  This document specifies an
   experimental scheme to provide more than one feedback signal per RTT
   in the TCP header.  Given TCP header space is scarce, it overloads
   the three existing ECN-related flags in the TCP header and provides
   additional information in a new TCP option.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 1, 2017.








Briscoe, et al.         Expires December 1, 2017                [Page 1]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Document Roadmap  . . . . . . . . . . . . . . . . . . . .   4
     1.2.  Goals . . . . . . . . . . . . . . . . . . . . . . . . . .   5
     1.3.  Experiment Goals  . . . . . . . . . . . . . . . . . . . .   5
     1.4.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   6
     1.5.  Recap of Existing ECN feedback in IP/TCP  . . . . . . . .   6
   2.  AccECN Protocol Overview and Rationale  . . . . . . . . . . .   7
     2.1.  Capability Negotiation  . . . . . . . . . . . . . . . . .   8
     2.2.  Feedback Mechanism  . . . . . . . . . . . . . . . . . . .   9
     2.3.  Delayed ACKs and Resilience Against ACK Loss  . . . . . .   9
     2.4.  Feedback Metrics  . . . . . . . . . . . . . . . . . . . .  10
     2.5.  Generic (Dumb) Reflector  . . . . . . . . . . . . . . . .  10
   3.  AccECN Protocol Specification . . . . . . . . . . . . . . . .  11
     3.1.  Negotiating to use AccECN . . . . . . . . . . . . . . . .  11
       3.1.1.  Negotiation during the TCP handshake  . . . . . . . .  11
       3.1.2.  Retransmission of the SYN . . . . . . . . . . . . . .  14
     3.2.  AccECN Feedback . . . . . . . . . . . . . . . . . . . . .  15
       3.2.1.  The ACE Field . . . . . . . . . . . . . . . . . . . .  15
       3.2.2.  Testing for Zeroing of the ACE Field  . . . . . . . .  16
       3.2.3.  Safety against Ambiguity of the ACE Field . . . . . .  17
       3.2.4.  The AccECN Option . . . . . . . . . . . . . . . . . .  17
       3.2.5.  Path Traversal of the AccECN Option . . . . . . . . .  19
       3.2.6.  Usage of the AccECN TCP Option  . . . . . . . . . . .  22
     3.3.  AccECN Compliance by TCP Proxies, Offload Engines and
           other Middleboxes . . . . . . . . . . . . . . . . . . . .  23
   4.  Interaction with Other TCP Variants . . . . . . . . . . . . .  24
     4.1.  Compatibility with SYN Cookies  . . . . . . . . . . . . .  24
     4.2.  Compatibility with Other TCP Options and Experiments  . .  25
     4.3.  Compatibility with Feedback Integrity Mechanisms  . . . .  25
   5.  Protocol Properties . . . . . . . . . . . . . . . . . . . . .  26
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28



Briscoe, et al.         Expires December 1, 2017                [Page 2]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  29
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  29
   9.  Comments Solicited  . . . . . . . . . . . . . . . . . . . . .  30
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  30
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  30
     10.2.  Informative References . . . . . . . . . . . . . . . . .  30
   Appendix A.  Example Algorithms . . . . . . . . . . . . . . . . .  33
     A.1.  Example Algorithm to Encode/Decode the AccECN Option  . .  33
     A.2.  Example Algorithm for Safety Against Long Sequences of
           ACK Loss  . . . . . . . . . . . . . . . . . . . . . . . .  34
       A.2.1.  Safety Algorithm without the AccECN Option  . . . . .  34
       A.2.2.  Safety Algorithm with the AccECN Option . . . . . . .  36
     A.3.  Example Algorithm to Estimate Marked Bytes from Marked
           Packets . . . . . . . . . . . . . . . . . . . . . . . . .  37
     A.4.  Example Algorithm to Beacon AccECN Options  . . . . . . .  38
     A.5.  Example Algorithm to Count Not-ECT Bytes  . . . . . . . .  39
   Appendix B.  Alternative Design Choices (To Be Removed Before
                Publication) . . . . . . . . . . . . . . . . . . . .  39
   Appendix C.  Open Protocol Design Issues (To Be Removed Before
                Publication) . . . . . . . . . . . . . . . . . . . .  40
   Appendix D.  Changes in This Version (To Be Removed Before
                Publication) . . . . . . . . . . . . . . . . . . . .  40
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  40

1.  Introduction

   Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where
   network nodes can mark IP packets instead of dropping them to
   indicate incipient congestion to the end-points.  Receivers with an
   ECN-capable transport protocol feed back this information to the
   sender.  ECN is specified for TCP in such a way that only one
   feedback signal can be transmitted per Round-Trip Time (RTT).
   Recently, proposed mechanisms like Congestion Exposure (ConEx
   [RFC7713]), DCTCP [I-D.ietf-tcpm-dctcp] or L4S
   [I-D.ietf-tsvwg-l4s-arch] need more accurate ECN feedback information
   whenever more than one marking is received in one RTT.  A fuller
   treatment of the motivation for this specification is given in the
   associated requirements document [RFC7560].

   This documents specifies an experimental scheme for ECN feedback in
   the TCP header to provide more than one feedback signal per RTT.  It
   will be called the more accurate ECN feedback scheme, or AccECN for
   short.  If AccECN progresses from experimental to the standards
   track, it is intended to be a complete replacement for classic ECN
   feedback, not a fork in the design of TCP.  Thus, the applicability
   of AccECN is intended to include all public and private IP networks
   (and even any non-IP networks over which TCP is used today).  Until
   the AccECN experiment succeeds, [RFC3168] will remain as the



Briscoe, et al.         Expires December 1, 2017                [Page 3]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   standards track specification for adding ECN to TCP.  To avoid
   confusion, in this document we use the term 'classic ECN' for the
   pre-existing ECN specification [RFC3168].

   AccECN feedback overloads flags and fields in the main TCP header
   with new definitions, so both ends have to support the new wire
   protocol before it can be used.  Therefore during the TCP handshake
   the two ends use the three ECN-related flags in the TCP header to
   negotiate the most advanced feedback protocol that they can both
   support.

   AccECN is solely an (experimental) change to the TCP wire protocol;
   it only specifies the negotiation and signaling of more accurate ECN
   feedback from a TCP Data Receiver to a Data Sender.  It is completely
   independent of how TCP might respond to congestion feedback, which is
   out of scope.  For that we refer to [RFC3168] or any RFC that
   specifies a different response to TCP ECN feedback, for example:
   [I-D.ietf-tcpm-dctcp]; or the ECN experiments referred to in
   [I-D.ietf-tsvwg-ecn-experimentation], namely: a TCP-based Low Latency
   Low Loss Scalable (L4S) congestion control [I-D.ietf-tsvwg-l4s-arch];
   ECN-capable TCP control packets [I-D.bagnulo-tcpm-generalized-ecn],
   or Alternative Backoff with ECN (ABE)
   [I-D.ietf-tcpm-alternativebackoff-ecn].

   It is likely (but not required) that the AccECN protocol will be
   implemented along with the following experimental additions to the
   TCP-ECN protocol: ECN-capable TCP control packets and retransmissions
   [I-D.bagnulo-tcpm-generalized-ecn], which includes the ECN-capable
   SYN-ACK experiment [RFC5562]; and testing receiver non-compliance
   [I-D.moncaster-tcpm-rcv-cheat].

1.1.  Document Roadmap

   The following introductory sections outline the goals of AccECN
   (Section 1.2) and the goal of experiments with ECN (Section 1.3) so
   that it is clear what success would look like.  Then terminology is
   defined (Section 1.4) and a recap of existing prerequisite technology
   is given (Section 1.5).

   Section 2 gives an informative overview of the AccECN protocol.  Then
   Section 3 gives the normative protocol specification.  Section 4
   assesses the interaction of AccECN with commonly used variants of
   TCP, whether standardised or not.  Section 5 summarises the features
   and properties of AccECN.

   Section 6 summarises the protocol fields and numbers that IANA will
   need to assign and Section 7 points to the aspects of the protocol
   that will be of interest to the security community.



Briscoe, et al.         Expires December 1, 2017                [Page 4]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   Appendix A gives pseudocode examples for the various algorithms that
   AccECN uses.

1.2.  Goals

   [RFC7560] enumerates requirements that a candidate feedback scheme
   will need to satisfy, under the headings: resilience, timeliness,
   integrity, accuracy (including ordering and lack of bias),
   complexity, overhead and compatibility (both backward and forward).
   It recognises that a perfect scheme that fully satisfies all the
   requirements is unlikely and trade-offs between requirements are
   likely.  Section 5 presents the properties of AccECN against these
   requirements and discusses the trade-offs made.

   The requirements document recognises that a protocol as ubiquitous as
   TCP needs to be able to serve as-yet-unspecified requirements.
   Therefore an AccECN receiver aims to act as a generic (dumb)
   reflector of congestion information so that in future new sender
   behaviours can be deployed unilaterally.

1.3.  Experiment Goals

   TCP is critical to the robust functioning of the Internet, therefore
   any proposed modifications to TCP need to be thoroughly tested.  The
   present specification describes an experimental protocol that adds
   more accurate ECN feedback to the TCP protocol.  The intention is to
   specify the protocol sufficiently so that more than one
   implementation can be built in order to test its function, robustness
   and interoperability (with itself and with previous version of ECN
   and TCP).

   The experimental protocol will be considered successful if it
   satisfies the requirements of [RFC7560] in the consensus opinion of
   the IETF tcpm working group.  In short, this requires that it
   improves the accuracy and timeliness of TCP's ECN feedback, as
   claimed in Section 5, while striking a balance between the
   conflicting requirements of resilience, integrity and minimisation of
   overhead.  It also requires that it is not unduly complex, and that
   it is compatible with prevalent equipment behaviours in the current
   Internet, whether or not they comply with standards.

   Testing will mostly focus on fall-back strategies in case of
   middlebox interference.  Current recommended strategies are specified
   in Sections 3.1.2, 3.2.2 and 3.2.5.  The effectiveness of these
   strategies depends on the actual deployment situation of middleboxes.
   Therefore experimental verification to confirm large-scale path
   traversal in the Internet is needed to finalize this specification on
   Standards Track.



Briscoe, et al.         Expires December 1, 2017                [Page 5]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


1.4.  Terminology

   AccECN:  The more accurate ECN feedback scheme will be called AccECN
      for short.

   Classic ECN:  the ECN protocol specified in [RFC3168].

   Classic ECN feedback:  the feedback aspect of the ECN protocol
      specified in [RFC3168], including generation, encoding,
      transmission and decoding of feedback, but not the Data Sender's
      subsequent response to that feedback.

   ACK:  A TCP acknowledgement, with or without a data payload.

   Pure ACK:  A TCP acknowledgement without a data payload.

   TCP client:  The TCP stack that originates a connection.

   TCP server:  The TCP stack that responds to a connection request.

   Data Receiver:  The endpoint of a TCP half-connection that receives
      data and sends AccECN feedback.

   Data Sender:  The endpoint of a TCP half-connection that sends data
      and receives AccECN feedback.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.5.  Recap of Existing ECN feedback in IP/TCP

   ECN [RFC3168] uses two bits in the IP header.  Once ECN has been
   negotiated with the receiver at the transport layer, an ECN sender
   can set two possible codepoints (ECT(0) or ECT(1)) in the IP header
   to indicate an ECN-capable transport (ECT).  If both ECN bits are
   zero, the packet is considered to have been sent by a Not-ECN-capable
   Transport (Not-ECT).  When a network node experiences congestion, it
   will occasionally either drop or mark a packet, with the choice
   depending on the packet's ECN codepoint.  If the codepoint is Not-
   ECT, only drop is appropriate.  If the codepoint is ECT(0) or ECT(1),
   the node can mark the packet by setting both ECN bits, which is
   termed 'Congestion Experienced' (CE), or loosely a 'congestion mark'.
   Table 1 summarises these codepoints.







Briscoe, et al.         Expires December 1, 2017                [Page 6]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   +-----------------------+---------------+---------------------------+
   | IP-ECN codepoint      | Codepoint     | Description               |
   | (binary)              | name          |                           |
   +-----------------------+---------------+---------------------------+
   | 00                    | Not-ECT       | Not ECN-Capable Transport |
   | 01                    | ECT(1)        | ECN-Capable Transport (1) |
   | 10                    | ECT(0)        | ECN-Capable Transport (0) |
   | 11                    | CE            | Congestion Experienced    |
   +-----------------------+---------------+---------------------------+

                  Table 1: The ECN Field in the IP Header

   In the TCP header the first two bits in byte 14 are defined as flags
   for the use of ECN (CWR and ECE in Figure 1 [RFC3168]).  A TCP client
   indicates it supports ECN by setting ECE=CWR=1 in the SYN, and an
   ECN-enabled server confirms ECN support by setting ECE=1 and CWR=0 in
   the SYN/ACK.  On reception of a CE-marked packet at the IP layer, the
   Data Receiver starts to set the Echo Congestion Experienced (ECE)
   flag continuously in the TCP header of ACKs, which ensures the signal
   is received reliably even if ACKs are lost.  The TCP sender confirms
   that it has received at least one ECE signal by responding with the
   congestion window reduced (CWR) flag, which allows the TCP receiver
   to stop repeating the ECN-Echo flag.  This always leads to a full RTT
   of ACKs with ECE set.  Thus any additional CE markings arriving
   within this RTT cannot be fed back.

   The last bit in byte 13 of the TCP header was defined as the Nonce
   Sum (NS) for the ECN Nonce [RFC3540].  RFC 3540 was never deployed so
   it is being reclassified as historic, making this TCP flag available
   for use by the AccECN experiment instead.

       0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
     |               |           | N | C | E | U | A | P | R | S | F |
     | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
     |               |           |   | R | E | G | K | H | T | N | N |
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

     Figure 1: The (post-ECN Nonce) definition of the TCP header flags

2.  AccECN Protocol Overview and Rationale

   This section provides an informative overview of the AccECN protocol
   that will be normatively specified in Section 3

   Like the original TCP approach, the Data Receiver of each TCP half-
   connection sends AccECN feedback to the Data Sender on TCP




Briscoe, et al.         Expires December 1, 2017                [Page 7]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   acknowledgements, reusing data packets of the other half-connection
   whenever possible.

   The AccECN protocol has had to be designed in two parts:

   o  an essential part that re-uses ECN TCP header bits to feed back
      the number of arriving CE marked packets.  This provides more
      accuracy than classic ECN feedback, but limited resilience against
      ACK loss;

   o  a supplementary part using a new AccECN TCP Option that provides
      additional feedback on the number of bytes that arrive marked with
      each of the three ECN codepoints (not just CE marks).  This
      provides greater resilience against ACK loss than the essential
      feedback, but it is more likely to suffer from middlebox
      interference.

   The two part design was necessary, given limitations on the space
   available for TCP options and given the possibility that certain
   incorrectly designed middleboxes prevent TCP using any new options.

   The essential part overloads the previous definition of the three
   flags in the TCP header that had been assigned for use by ECN.  This
   design choice deliberately replaces the classic ECN feedback
   protocol, rather than leaving classic ECN feedback intact and adding
   more accurate feedback separately because:

   o  this efficiently reuses scarce TCP header space, given TCP option
      space is approaching saturation;

   o  a single upgrade path for the TCP protocol is preferable to a fork
      in the design;

   o  otherwise classic and accurate ECN feedback could give conflicting
      feedback on the same segment, which could open up new security
      concerns and make implementations unnecessarily complex;

   o  middleboxes are more likely to faithfully forward the TCP ECN
      flags than newly defined areas of the TCP header.

   AccECN is designed to work even if the supplementary part is removed
   or zeroed out, as long as the essential part gets through.

2.1.  Capability Negotiation

   AccECN is a change to the wire protocol of the main TCP header,
   therefore it can only be used if both endpoints have been upgraded to
   understand it.  The TCP client signals support for AccECN on the



Briscoe, et al.         Expires December 1, 2017                [Page 8]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   initial SYN of a connection and the TCP server signals whether it
   supports AccECN on the SYN/ACK.  The TCP flags on the SYN that the
   client uses to signal AccECN support have been carefully chosen so
   that a TCP server will interpret them as a request to support the
   most recent variant of ECN feedback that it supports.  Then the
   client falls back to the same variant of ECN feedback.

   An AccECN TCP client does not send the new AccECN Option on the SYN
   as SYN option space is limited and successful negotiation using the
   flags in the main header is taken as sufficient evidence that both
   ends also support the AccECN Option.  The TCP server sends the AccECN
   Option on the SYN/ACK and the client sends it on the first ACK to
   test whether the network path forwards the option correctly.

2.2.  Feedback Mechanism

   A Data Receiver maintains four counters initialised at the start of
   the half-connection.  Three count the number of arriving payload
   bytes marked CE, ECT(1) and ECT(0) respectively.  The fourth counts
   the number of packets arriving marked with a CE codepoint (including
   control packets without payload if they are CE-marked).

   The Data Sender maintains four equivalent counters for the half
   connection, and the AccECN protocol is designed to ensure they will
   match the values in the Data Receiver's counters, albeit after a
   little delay.

   Each ACK carries the three least significant bits (LSBs) of the
   packet-based CE counter using the ECN bits in the TCP header, now
   renamed the Accurate ECN (ACE) field (see Figure 2 later).  The LSBs
   of each of the three byte counters are carried in the AccECN Option.

2.3.  Delayed ACKs and Resilience Against ACK Loss

   With both the ACE and the AccECN Option mechanisms, the Data Receiver
   continually repeats the current LSBs of each of its respective
   counters.  Then, even if some ACKs are lost, the Data Sender should
   be able to infer how much to increment its own counters, even if the
   protocol field has wrapped.

   The 3-bit ACE field can wrap fairly frequently.  Therefore, even if
   it appears to have incremented by one (say), the field might have
   actually cycled completely then incremented by one.  The Data
   Receiver is required not to delay sending an ACK to such an extent
   that the ACE field would cycle.  However cyling is still a
   possibility at the Data Sender because a whole sequence of ACKs
   carrying intervening values of the field might all be lost or delayed
   in transit.



Briscoe, et al.         Expires December 1, 2017                [Page 9]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   The fields in the AccECN Option are larger, but they will increment
   in larger steps because they count bytes not packets.  Nonetheless,
   their size has been chosen such that a whole cycle of the field would
   never occur between ACKs unless there had been an infeasibly long
   sequence of ACK losses.  Therefore, as long as the AccECN Option is
   available, it can be treated as a dependable feedback channel.

   If the AccECN Option is not available, e.g. it is being stripped by a
   middlebox, the AccECN protocol will only feed back information on CE
   markings (using the ACE field).  Although not ideal, this will be
   sufficient, because it is envisaged that neither ECT(0) nor ECT(1)
   will ever indicate more severe congestion than CE, even though future
   uses for ECT(0) or ECT(1) are still unclear
   [I-D.ietf-tsvwg-ecn-experimentation].  Because the 3-bit ACE field is
   so small, when it is the only field available the Data Sender has to
   interpret it conservatively assuming the worst possible wrap.

   Certain specified events trigger the Data Receiver to include an
   AccECN Option on an ACK.  The rules are designed to ensure that the
   order in which different markings arrive at the receiver is
   communicated to the sender (as long as there is no ACK loss).
   Implementations are encouraged to send an AccECN Option more
   frequently, but this is left up to the implementer.

2.4.  Feedback Metrics

   The CE packet counter in the ACE field and the CE byte counter in the
   AccECN Option both provide feedback on received CE-marks.  The CE
   packet counter includes control packets that do not have payload
   data, while the CE byte counter solely includes marked payload bytes.
   If both are present, the byte counter in the option will provide the
   more accurate information needed for modern congestion control and
   policing schemes, such as DCTCP or ConEx.  If the option is stripped,
   a simple algorithm to estimate the number of marked bytes from the
   ACE field is given in Appendix A.3.

   Feedback in bytes is recommended in order to protect against the
   receiver using attacks similar to 'ACK-Division' to artificially
   inflate the congestion window, which is why [RFC5681] now recommends
   that TCP counts acknowledged bytes not packets.

2.5.  Generic (Dumb) Reflector

   The ACE field provides information about CE markings on both data and
   control packets.  According to [RFC3168] the Data Sender is meant to
   set control packets to Not-ECT.  However, mechanisms in certain
   private networks (e.g. data centres) set control packets to be ECN




Briscoe, et al.         Expires December 1, 2017               [Page 10]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   capable because they are precisely the packets that performance
   depends on most.

   For this reason, AccECN is designed to be a generic reflector of
   whatever ECN markings it sees, whether or not they are compliant with
   a current standard.  Then as standards evolve, Data Senders can
   upgrade unilaterally without any need for receivers to upgrade too.
   It is also useful to be able to rely on generic reflection behaviour
   when senders need to test for unexpected interference with markings
   (for instance [I-D.kuehlewind-tcpm-ecn-fallback] and
   [I-D.moncaster-tcpm-rcv-cheat]).

   The initial SYN is the most critical control packet, so AccECN
   provides feedback on whether it is CE marked.  Although RFC 3168
   prohibits an ECN-capable SYN, providing feedback of CE marking on the
   SYN supports future scenarios in which SYNs might be ECN-enabled
   (without prejudging whether they ought to be).  For instance,
   [I-D.ietf-tsvwg-ecn-experimentation] updates this aspect of RFC 3168
   to allow experimentation with ECN-capable TCP control packets.

   Even if the TCP client has set the SYN to not-ECT in compliance with
   RFC 3168, feedback on whether it has been CE-marked could still be
   useful, because middleboxes have been known to overwrite the ECN IP
   field as if it is still part of the old Type of Service (ToS) field.
   If a TCP client has set the SYN to Not-ECT, but receives CE feedback,
   it can detect such middlebox interference and send Not-ECT for the
   rest of the connection (see [I-D.kuehlewind-tcpm-ecn-fallback]).
   Today, if a TCP server receives CE on a SYN, it cannot know whether
   it is invalid (or valid) because only the TCP client knows whether it
   originally marked the SYN as Not-ECT (or ECT).  Therefore, prior to
   AccECN, the server's only safe course of action was to disable ECN
   for the connection.  Instead, the AccECN protocol allows the server
   to feed back the CE marking to the client, which then has all the
   information to decide whether the connection has to fall-back from
   supporting ECN (or not).

3.  AccECN Protocol Specification

3.1.  Negotiating to use AccECN

3.1.1.  Negotiation during the TCP handshake

   Given the ECN Nonce [RFC3540] is being reclassified as historic, the
   present specification renames the TCP flag at bit 7 of the TCP header
   flags from NS (Nonce Sum) to AE (Accurate ECN) (see IANA
   Considerations in Section 6).





Briscoe, et al.         Expires December 1, 2017               [Page 11]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   During the TCP handshake at the start of a connection, to request
   more accurate ECN feedback the TCP client (host A) MUST set the TCP
   flags AE=1, CWR=1 and ECE=1 in the initial SYN segment.

   If a TCP server (B) that is AccECN-enabled receives a SYN with the
   above three flags set, it MUST set both its half connections into
   AccECN mode.  Then it MUST set the TCP flags CWR=1 and ECE=0 on its
   response in the SYN/ACK segment to confirm that it supports AccECN.
   The TCP server MUST NOT set this combination of flags unless the
   preceding SYN requested support for AccECN as above.

   A TCP server in AccECN mode MUST additionally set the TCP flag AE=1
   on the SYN/ACK if the IP/ECN field of the SYN was CE-marked (see
   Section 2.5 for rationale).  If the IP/ECN field of the received SYN
   was Not-ECT, ECT(0) or ECT(1), it MUST clear the TCP AE flag (AE=0)
   on the SYN/ACK.

   Once a TCP client (A) has sent the above SYN to declare that it
   supports AccECN, and once it has received the above SYN/ACK segment
   that confirms that the TCP server supports AccECN, the TCP client
   MUST set both its half connections into AccECN mode.

   The procedure for the client to follow if a SYN/ACK does not arrive
   before its retransmission timer expires is given in Section 3.1.2.

   The three flags set to 1 to indicate AccECN support on the SYN have
   been carefully chosen to enable natural fall-back to prior stages in
   the evolution of ECN.  Table 2 tabulates all the negotiation
   possibilities for ECN-related capabilities that involve at least one
   AccECN-capable host.  The entries in the first two columns have been
   abbreviated, as follows:

   AccECN:  More Accurate ECN Feedback (the present specification)

   Nonce:  ECN Nonce feedback [RFC3540]

   ECN:  'Classic' ECN feedback [RFC3168]

   No ECN:  Not-ECN-capable.  Implicit congestion notification using
       packet drop.











Briscoe, et al.         Expires December 1, 2017               [Page 12]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   +--------+---------+------------+--------------+--------------------+
   | A      | B       |  SYN A->B  | SYN/ACK B->A | Feedback Mode      |
   +--------+---------+------------+--------------+--------------------+
   |        |         | AE CWR ECE |  AE CWR ECE  |                    |
   | AccECN | AccECN  | 1   1   1  |  0   1   0   | AccECN             |
   | AccECN | AccECN  | 1   1   1  |  1   1   0   | AccECN (CE on SYN) |
   |        |         |            |              |                    |
   | AccECN | Nonce   | 1   1   1  |  1   0   1   | classic ECN        |
   | AccECN | ECN     | 1   1   1  |  0   0   1   | classic ECN        |
   | AccECN | No ECN  | 1   1   1  |  0   0   0   | Not ECN            |
   |        |         |            |              |                    |
   | Nonce  | AccECN  | 0   1   1  |  0   0   1   | classic ECN        |
   | ECN    | AccECN  | 0   1   1  |  0   0   1   | classic ECN        |
   | No ECN | AccECN  | 0   0   0  |  0   0   0   | Not ECN            |
   |        |         |            |              |                    |
   | AccECN | Broken  | 1   1   1  |  1   1   1   | Not ECN            |
   | AccECN | AccECN+ | 1   1   1  |  0   1   1   | AccECN (CU)        |
   | AccECN | AccECN+ | 1   1   1  |  1   0   0   | AccECN (CU)        |
   +--------+---------+------------+--------------+--------------------+

   Table 2: ECN capability negotiation between Client (A) and Server (B)

   Table 2 is divided into blocks each separated by an empty row.

   1.  The top block shows the case already described where both
       endpoints support AccECN and how the TCP server (B) indicates
       congestion feedback.

   2.  The second block shows the cases where the TCP client (A)
       supports AccECN but the TCP server (B) supports some earlier
       variant of TCP feedback, indicated in its SYN/ACK.  Therefore, as
       soon as an AccECN-capable TCP client (A) receives the SYN/ACK
       shown it MUST set both its half connections into the feedback
       mode shown in the rightmost column.

   3.  The third block shows the cases where the TCP server (B) supports
       AccECN but the TCP client (A) supports some earlier variant of
       TCP feedback, indicated in its SYN.  Therefore, as soon as an
       AccECN-enabled TCP server (B) receives the SYN shown, it MUST set
       both its half connections into the feedback mode shown in the
       rightmost column.

   4.  The fourth block displays combinations that are not valid or
       currently unused.  The first case (labelled `Broken' is where all
       bits set in the SYN are reflected by the receiver in the SYN/ACK,
       which happens quite often if the TCP connection is proxied.  In
       this case, both ends MUST fall-back to Not ECN for both half
       connections.  The other two cases (labelled 'AccECN (CU)') are



Briscoe, et al.         Expires December 1, 2017               [Page 13]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


       currently unassigned and available for an RFC to extend TCP in
       future, tagged as 'AccECN+' (see Appendix B for possible uses).
       For forward compatibility, as soon as an AccECN-capable TCP
       client (A) receives either of these SYN/ACKs it MUST set both its
       half connections into AccECN mode, as if the SYN/ACK had been
       AE=0, CWR=1, ECE=0.

   The following exceptional cases need some explanation:

   ECN Nonce:  An AccECN implementation, whether client or server,
      sender or receiver, does not need to implement the ECN Nonce
      feedback mode [RFC3540], which is being reclassified as historic
      [I-D.ietf-tsvwg-ecn-experimentation].  AccECN is compatible with
      an alternative ECN feedback integrity approach that does not use
      up the ECT(1) codepoint and can be implemented solely at the
      sender (see Section 4.3).

   Simultaneous Open:  An originating AccECN Host (A), having sent a SYN
      with AE=1, CWR=1 and ECE=1, might receive another SYN from host B.
      Host A MUST then enter the same feedback mode as it would have
      entered had it been a responding host and received the same SYN.
      Then host A MUST send the same SYN/ACK as it would have sent had
      it been a responding host (see the third block above).

3.1.2.  Retransmission of the SYN

   If the sender of an AccECN SYN times out before receiving the SYN/
   ACK, the sender SHOULD attempt to negotiate the use of AccECN at
   least one more time by continuing to set all three TCP ECN flags on
   the first retransmitted SYN (using the usual retransmission time-
   outs).  If this first retransmission also fails to be acknowledged,
   the sender SHOULD send subsequent retransmissions of the SYN without
   any ECN flags set.  This adds delay, in the case where a middlebox
   drops an AccECN (or ECN) SYN deliberately.  However, current
   measurements imply that a drop is less likely to be due to middlebox
   interference than other intermittent causes of loss, e.g. congestion,
   wireless interference, etc.

   Implementers MAY use other fall-back strategies if they are found to
   be more effective (e.g. attempting to retransmit an AccECN SYN only
   once or more than twice (most appropriate during high levels of
   congestion); or falling back to classic ECN feedback rather than non-
   ECN).  Further it may make sense to also remove any other
   experimental fields or options on the SYN in case a middlebox might
   be blocking them, although the required behaviour will depend on the
   specification of the other option(s) and any attempt to co-ordinate
   fall-back between different modules of the stack.  In any case, the
   TCP initiator SHOULD cache failed connection attempts.  If it does,



Briscoe, et al.         Expires December 1, 2017               [Page 14]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   it SHOULD NOT give up attempting to negotiate AccECN on the SYN of
   subsequent connection attempts until it is clear that the blockage is
   persistently and specifically due to AccECN.  The cache should be
   arranged to expire so that the initiator will infrequently attempt to
   check whether the problem has been resolved.

   The fall-back procedure if the TCP server receives no ACK to
   acknowledge a SYN/ACK that tried to negotiate AccECN is specified in
   Section 3.2.5.

3.2.  AccECN Feedback

   Each Data Receiver maintains four counters, r.cep, r.ceb, r.e0b and
   r.e1b.  The CE packet counter (r.cep), counts the number of packets
   the host receives with the CE code point in the IP ECN field,
   including CE marks on control packets without data. r.ceb, r.e0b and
   r.e1b count the number of TCP payload bytes in packets marked
   respectively with the CE, ECT(0) and ECT(1) codepoint in their IP-ECN
   field.  When a host first enters AccECN mode, it initialises its
   counters to r.cep = 6, r.e0b = 1 and r.ceb = r.e1b.= 0 (see
   Appendix A.5).  Non-zero initial values are used to support a
   stateless handshake (see Section 4.1) and to be distinct from cases
   where the fields are incorrectly zeroed (e.g. by middleboxes - see
   Section 3.2.5.4).

   A host feeds back the CE packet counter using the Accurate ECN (ACE)
   field, as explained in the next section.  And it feeds back all the
   byte counters using the AccECN TCP Option, as specified in
   Section 3.2.4.  Whenever a host feeds back the value of any counter,
   it MUST report the most recent value, no matter whether it is in a
   pure ACK, an ACK with new payload data or a retransmission.

3.2.1.  The ACE Field

   After AccECN has been negotiated on the SYN and SYN/ACK, both hosts
   overload the three TCP flags (AE, CWR and ECE) in the main TCP header
   as one 3-bit field.  Then the field is given a new name, ACE, as
   shown in Figure 2.

       0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
     |               |           |           | U | A | P | R | S | F |
     | Header Length | Reserved  |    ACE    | R | C | S | S | Y | I |
     |               |           |           | G | K | H | T | N | N |
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

    Figure 2: Definition of the ACE field within bytes 13 and 14 of the
          TCP Header (when AccECN has been negotiated and SYN=0).



Briscoe, et al.         Expires December 1, 2017               [Page 15]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   The original definition of these three flags in the TCP header,
   including the addition of support for the ECN Nonce, is shown for
   comparison in Figure 1.  This specification does not rename these
   three TCP flags to ACE for always; it merely overloads them with
   another name and definition once an AccECN connection has been
   established.

   A host MUST interpret the AE, CWR and ECE flags as the 3-bit ACE
   counter on a segment with the SYN flag cleared (SYN=0) that it sends
   or receives if both of its half-connections are set into AccECN mode
   having successfully negotiated AccECN (see Section 3.1).  A host MUST
   NOT interpret the 3 flags as a 3-bit ACE field on any segment with
   SYN=1 (whether ACK is 0 or 1), or if AccECN negotiation is incomplete
   or has not succeeded.

   Both parts of each of these conditions are equally important.  For
   instance, even if AccECN negotiation has been successful, the ACE
   field is not defined on any segments with SYN=1 (e.g. a
   retransmission of an unacknowledged SYN/ACK, or when both ends send
   SYN/ACKs after AccECN support has been successfully negotiated during
   a simultaneous open).

   The ACE field encodes the three least significant bits of the r.cep
   counter, therefore its initial value will be 0b110 (decimal 6).  If
   the SYN/ACK was CE marked, the client MUST increase its r.cep counter
   before it sends its first ACK, therefore the initial value of the ACE
   field will be 0b111 (decimal 7).  To support a stateless handshake
   (see Section 4.1), these values have been chosen deliberately so that
   they are distinct from [RFC5562] behaviour, where the TCP client
   would set ECE on the first ACK as feedback for a CE mark on the SYN/
   ACK.

3.2.2.  Testing for Zeroing of the ACE Field

   Section 3.2.1 required the Data Receiver to initialize the r.cep
   counter to a non-zero value.  Therefore, in either direction the
   initial value of the ACE field ought to be non-zero.

   If AccECN has been successfully negotiated, the Data Sender SHOULD
   check the initial value of the ACE field in the first arriving
   segment with SYN=0.  If the initial value of the ACE field is zero
   (0b000), the Data Sender MUST disable sending ECN-capable packets for
   the remainder of the half-connection by setting the IP/ECN field in
   all subsequent packets to Not-ECT.

   For example, the server checks the ACK of the SYN/ACK or the first
   data segment from the client, while the client checks the first data
   segment from the server.  More precisely, the "first segment with



Briscoe, et al.         Expires December 1, 2017               [Page 16]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   SYN=0" is defined as: the segment with SYN=0 that i) acknowledges
   sequence space at least covering the initial sequence number (ISN)
   plus 1; and ii) arrives before any other segments with SYN=0 so it is
   unlikely to be a retransmission.  If no such segment arrives (e.g.
   because it is lost and the ISN is first acknowledged by a subsequent
   segment), no test for invalid initialization can be conducted, and
   the half-connection will continue in AccECN mode.

   Note that the Data Sender MUST NOT test whether the arriving counter
   in the initial ACE field has been initialized to a specific valid
   value - the above check solely tests whether the ACE fields have been
   incorrectly zeroed.  This allows hosts to use different initial
   values as an additional signalling channel in future.

3.2.3.  Safety against Ambiguity of the ACE Field

   If too many CE-marked segments are acknowledged at once, or if a long
   run of ACKs is lost, the 3-bit counter in the ACE field might have
   cycled between two ACKs arriving at the Data Sender.

   Therefore an AccECN Data Receiver SHOULD immediately send an ACK once
   'n' CE marks have arrived since the previous ACK, where 'n' SHOULD be
   2 and MUST be no greater than 6.

   If the Data Sender has not received AccECN TCP Options to give it
   more dependable information, and it detects that the ACE field could
   have cycled under the prevailing conditions, it SHOULD conservatively
   assume that the counter did cycle.  It can detect if the counter
   could have cycled by using the jump in the acknowledgement number
   since the last ACK to calculate or estimate how many segments could
   have been acknowledged.  An example algorithm to implement this
   policy is given in Appendix A.2.  An implementer MAY develop an
   alternative algorithm as long as it satisfies these requirements.

   If missing acknowledgement numbers arrive later (reordering) and
   prove that the counter did not cycle, the Data Sender MAY attempt to
   neutralise the effect of any action it took based on a conservative
   assumption that it later found to be incorrect.

3.2.4.  The AccECN Option

   The AccECN Option is defined as shown below in Figure 3.  It consists
   of three 24-bit fields that provide the 24 least significant bits of
   the r.e0b, r.ceb and r.e1b counters, respectively.  The initial 'E'
   of each field name stands for 'Echo'.






Briscoe, et al.         Expires December 1, 2017               [Page 17]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Kind = TBD1  |  Length = 11  |          EE0B field           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | EE0B (cont'd) |           ECEB field                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  EE1B field                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Figure 3: The AccECN Option

   The Data Receiver MUST set the Kind field to TBD1, which is
   registered in Section 6 as a new TCP option Kind called AccECN.  An
   experimental TCP option with Kind=254 MAY be used for initial
   experiments, with magic number 0xACCE.

   Appendix A.1 gives an example algorithm for the Data Receiver to
   encode its byte counters into the AccECN Option, and for the Data
   Sender to decode the AccECN Option fields into its byte counters.

   Note that there is no field to feedback Not-ECT bytes.  Nonetheless
   an algorithm for the Data Sender to calculate the number of payload
   bytes received as Not-ECT is given in Appendix A.5.

   Whenever a Data Receiver sends an AccECN Option, the rules in
   Section 3.2.6 expect it to always send a full-length option.  To cope
   with option space limitations, it can omit unchanged fields from the
   tail of the option, as long as it preserves the order of the
   remaining fields and includes any field that has changed.  The length
   field MUST indicate which fields are present as follows:

   Length=11:  EE0B, ECEB, EE1B

   Length=8:  EE0B, ECEB

   Length=5:  EE0B

   Length=2:  (empty)

   The empty option of Length=2 is provided to allow for a case where an
   AccECN Option has to be sent (e.g. on the SYN/ACK to test the path),
   but there is very limited space for the option.  For initial
   experiments, the Length field MUST be 2 greater to accommodate the
   16-bit magic number.






Briscoe, et al.         Expires December 1, 2017               [Page 18]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   All implementations of a Data Sender MUST be able to read in AccECN
   Options of any of the above lengths.  They MUST ignore an AccECN
   Option of any other length.

3.2.5.  Path Traversal of the AccECN Option

3.2.5.1.  Testing the AccECN Option during the Handshake

   The TCP client MUST NOT include the AccECN TCP Option on the SYN.
   Nonetheless, if the AccECN negotiation using the ECN flags in the
   main TCP header (Section 3.1) is successful, it implicitly declares
   that the endpoints also support the AccECN TCP Option.  A fall-back
   strategy for the loss of the SYN (possibly due to middlebox
   interference) is specified in Section 3.1.2.

   A TCP server that confirms its support for AccECN (in response to an
   AccECN SYN from the client as described in Section 3.1) SHOULD also
   include an AccECN TCP Option in the SYN/ACK.

   A TCP client that has successfully negotiated AccECN SHOULD include
   an AccECN Option in the first ACK at the end of the 3WHS.  However,
   this first ACK is not delivered reliably, so the TCP client SHOULD
   also include an AccECN Option on the first data segment it sends (if
   it ever sends one).

   A host MAY NOT include an AccECN Option in any of these three cases
   if it has cached knowledge that the packet would be likely to be
   blocked on the path to the other host if it included an AccECN
   Option.

3.2.5.2.  Testing for Loss of Packets Carrying the AccECN Option

   If after the normal TCP timeout the TCP server has not received an
   ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been
   lost, e.g. due to congestion, or a middlebox might be blocking the
   AccECN Option.  To expedite connection setup, the TCP server SHOULD
   retransmit the SYN/ACK with the same TCP flags (AE, CWR and ECE) but
   with no AccECN Option.  If this retransmission times out, to expedite
   connection setup, the TCP server SHOULD disable AccECN and ECN for
   this connection by retransmitting the SYN/ACK with AE=CWR=ECE=0 and
   no AccECN Option.  Implementers MAY use other fall-back strategies if
   they are found to be more effective (e.g.  falling back to classic
   ECN feedback on the first retransmission; retrying the AccECN Option
   for a second time before fall-back (most appropriate during high
   levels of congestion); or falling back to classic ECN feedback rather
   than non-ECN on the third retransmission).





Briscoe, et al.         Expires December 1, 2017               [Page 19]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   If the TCP client detects that the first data segment it sent with
   the AccECN Option was lost, it SHOULD fall back to no AccECN Option
   on the retransmission.  Again, implementers MAY use other fall-back
   strategies such as attempting to retransmit a second segment with the
   AccECN Option before fall-back, and/or caching whether the AccECN
   Option is blocked for subsequent connections.

   Either host MAY include the AccECN Option in a subsequent segment to
   retest whether the AccECN Option can traverse the path.

   If the TCP server receives a second SYN with a request for AccECN
   support, it should resend the SYN/ACK, again confirming its support
   for AccECN, but this time without the AccECN Option.  This approach
   rules out any interference by middleboxes that may drop packets with
   unknown options, even though it is more likely that the SYN/ACK would
   have been lost due to congestion.  The TCP server MAY try to send
   another packet with the AccECN Option at a later point during the
   connection but should monitor if that packet got lost as well, in
   which case it SHOULD disable the sending of the AccECN Option for
   this half-connection.

   Similarly, an AccECN end-point MAY separately memorize which data
   packets carried an AccECN Option and disable the sending of AccECN
   Options if the loss probability of those packets is significantly
   higher than that of all other data packets in the same connection.

3.2.5.3.  Testing for Stripping of the AccECN Option

   If the TCP client has successfully negotiated AccECN but does not
   receive an AccECN Option on the SYN/ACK, it switches into a mode that
   assumes that the AccECN Option is not available for this half
   connection.

   Similarly, if the TCP server has successfully negotiated AccECN but
   does not receive an AccECN Option on the first segment that
   acknowledges sequence space at least covering the ISN, it switches
   into a mode that assumes that the AccECN Option is not available for
   this half connection.

   While a host is in this mode that assumes incoming AccECN Options are
   not available, it MUST adopt the conservative interpretation of the
   ACE field discussed in Section 3.2.3.  However, it cannot make any
   assumption about support of outgoing AccECN Options on the other half
   connection, so it SHOULD continue to send the AccECN Option itself
   (unless it has established that sending the AccECN Option is causing
   packets to be blocked as in Section 3.2.5.2).





Briscoe, et al.         Expires December 1, 2017               [Page 20]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   If a host is in the mode that assumes incoming AccECN Options are not
   available, but it receives an AccECN Option at any later point during
   the connection, this clearly indicates that the AccECN Option is not
   blocked on the respective path, and the AccECN endpoint MAY switch
   out of the mode that assumes the AccECN Option is not available for
   this half connection.

3.2.5.4.  Test for Zeroing of the AccECN Option

   For a related test for invalid initialization of the ACE field, see
   Section 3.2.2

   Section 3.2 required the Data Receiver to initialize the r.e0b
   counter to a non-zero value.  Therefore, in either direction the
   initial value of the EE0B field in the AccECN Option (if one exists)
   ought to be non-zero.  If AccECN has been negotiated:

   o  the TCP server MAY check the initial value of the EE0B field in
      the first segment that acknowledges sequence space that at least
      covers the ISN plus 1.  If the initial value of the EE0B field is
      zero, the server will switch into a mode that ignores the AccECN
      Option for this half connection.

   o  the TCP client MAY check the initial value of the EE0B field on
      the SYN/ACK.  If the initial value of the EE0B field is zero, the
      client will switch into a mode that ignores the AccECN Option for
      this half connection.

   While a host is in the mode that ignores the AccECN Option it MUST
   adopt the conservative interpretation of the ACE field discussed in
   Section 3.2.3.

   Note that the Data Sender MUST NOT test whether the arriving byte
   counters in the initial AccECN Option have been initialized to
   specific valid values - the above checks solely test whether these
   fields have been incorrectly zeroed.  This allows hosts to use
   different initial values as an additional signalling channel in
   future.  Also note that the initial value of either field might be
   greater than its expected initial value, because the counters might
   already have been incremented.  Nonetheless, the initial values of
   the counters have been chosen so that they cannot wrap to zero on
   these initial segments.

3.2.5.5.  Consistency between AccECN Feedback Fields

   When the AccECN Option is available it supplements but does not
   replace the ACE field.  An endpoint using AccECN feedback MUST always




Briscoe, et al.         Expires December 1, 2017               [Page 21]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   consider the information provided in the ACE field whether or not the
   AccECN Option is also available.

   If the AccECN option is present, the s.cep counter might increase
   while the s.ceb counter does not (e.g. due to a CE-marked control
   packet).  The sender's response to such a situation is out of scope,
   and needs to be dealt with in a specification that uses ECN-capable
   control packets.  Theoretically, this situation could also occur if a
   middlebox mangled the AccECN Option but not the ACE field.  However,
   the Data Sender has to assume that the integrity of the AccECN Option
   is sound, based on the above test of the well-known initial values
   and optionally other integrity tests (Section 4.3).

   If either end-point detects that the s.ceb counter has increased but
   the s.cep has not (and by testing ACK coverage it is certain how much
   the ACE field has wrapped), this invalid protocol transition has to
   be due to some form of feedback mangling.  So, the Data Sender MUST
   disable sending ECN-capable packets for the remainder of the half-
   connection by setting the IP/ECN field in all subsequent packets to
   Not-ECT.

3.2.6.  Usage of the AccECN TCP Option

   The following rules determine when a Data Receiver in AccECN mode
   sends the AccECN TCP Option, and which fields to include:

   Change-Triggered ACKs:  If an arriving packet increments a different
      byte counter to that incremented by the previous packet, the Data
      Receiver SHOULD immediately send an ACK with an AccECN Option,
      without waiting for the next delayed ACK (this is in addition to
      the safety recommendation in Section 3.2.3 against ambiguity of
      the ACE field).  Certain offload hardware might not be able to
      support change-triggered ACKs, but otherwise it is important to
      keep exceptions to this rule to a minimum so that Data Senders can
      generally rely on this behaviour;

   Continual Repetition:  Otherwise, if arriving packets continue to
      increment the same byte counter, the Data Receiver can include an
      AccECN Option on most or all (delayed) ACKs, but it does not have
      to.  If option space is limited on a particular ACK, the Data
      Receiver MUST give precedence to SACK information about loss.  It
      SHOULD include an AccECN Option if the r.ceb counter has
      incremented and it MAY include an AccECN Option if r.ec0b or
      r.ec1b has incremented;

   Full-Length Options Preferred:  It SHOULD always use full-length
      AccECN Options.  It MAY use shorter AccECN Options if space is
      limited, but it MUST include the counter(s) that have incremented



Briscoe, et al.         Expires December 1, 2017               [Page 22]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


      since the previous AccECN Option and it MUST only truncate fields
      from the right-hand tail of the option to preserve the order of
      the remaining fields (see Section 3.2.4);

   Beaconing Full-Length Options:  Nonetheless, it MUST include a full-
      length AccECN TCP Option on at least three ACKs per RTT, or on all
      ACKs if there are less than three per RTT (see Appendix A.4 for an
      example algorithm that satisfies this requirement).

   The following example series of arriving IP/ECN fields illustrates
   when a Data Receiver will emit an ACK if it is using a delayed ACK
   factor of 2 segments and change-triggered ACKs: 01 -> ACK, 01, 01 ->
   ACK, 10 -> ACK, 10, 01 -> ACK, 01, 11 -> ACK, 01 -> ACK.

   For the avoidance of doubt, the change-triggered ACK mechanism is
   deliberately worded to ignore the arrival of a control packet with no
   payload, which therefore does not alter any byte counters, because it
   is important that TCP does not acknowledge pure ACKs.  The change-
   triggered ACK approach will lead to some additional ACKs but it feeds
   back the timing and the order in which ECN marks are received with
   minimal additional complexity.

   Implementation note: sending an AccECN Option each time a different
   counter changes and including a full-length AccECN Option on every
   delayed ACK will satisfy the requirements described above and might
   be the easiest implementation, as long as sufficient space is
   available in each ACK (in total and in the option space).

   Appendix A.3 gives an example algorithm to estimate the number of
   marked bytes from the ACE field alone, if the AccECN Option is not
   available.

   If a host has determined that segments with the AccECN Option always
   seem to be discarded somewhere along the path, it is no longer
   obliged to follow the above rules.

3.3.  AccECN Compliance by TCP Proxies, Offload Engines and other
      Middleboxes

   A large class of middleboxes split TCP connections.  Such a middlebox
   would be compliant with the AccECN protocol if the TCP implementation
   on each side complied with the present AccECN specification and each
   side negotiated AccECN independently of the other side.

   Another large class of middleboxes intervenes to some degree at the
   transport layer, but attempts to be transparent (invisible) to the
   end-to-end connection.  A subset of this class of middleboxes
   attempts to `normalise' the TCP wire protocol by checking that all



Briscoe, et al.         Expires December 1, 2017               [Page 23]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   values in header fields comply with a rather narrow interpretation of
   the TCP specifications.  To comply with the present AccECN
   specification, such a middlebox MUST NOT change the ACE field or the
   AccECN Option and it MUST attempt to preserve the timing of each ACK
   (for example, if it coalesced ACKs it would not be AccECN-compliant).
   A middlebox claiming to be transparent at the transport layer MUST
   forward the AccECN TCP Option unaltered, whether or not the length
   value matches one of those specified in Section 3.2.4, and whether or
   not the initial values of the byte-counter fields are correct.  This
   is because blocking apparently invalid values does not improve
   security (because AccECN hosts are required to ignore invalid values
   anyway), while it prevents the standardised set of values being
   extended in future (because outdated normalisers would block updated
   hosts from using the extended AccECN standard).

   Hardware to offload certain TCP processing represents another large
   class of middleboxes, even though it is often a function of a host's
   network interface and rarely in its own 'box'.  Leeway has been
   allowed in the present AccECN specification in the expectation that
   offload hardware could comply and still serve its function.
   Nonetheless, such hardware MUST attempt to preserve the timing of
   each ACK (for example, if it coalesced ACKs it would not be AccECN-
   compliant).

4.  Interaction with Other TCP Variants

   This section is informative, not normative.

4.1.  Compatibility with SYN Cookies

   A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to
   protect itself from SYN flooding attacks.  It places minimal commonly
   used connection state in the SYN/ACK, and deliberately does not hold
   any state while waiting for the subsequent ACK (e.g. it closes the
   thread).  Therefore it cannot record the fact that it entered AccECN
   mode for both half-connections.  Indeed, it cannot even remember
   whether it negotiated the use of classic ECN [RFC3168].

   Nonetheless, such a server can determine that it negotiated AccECN as
   follows.  If a TCP server using SYN Cookies supports AccECN and if
   the first segment it receives that at least covers the ISN contains
   an ACE field with the value 0b110 or 0b111, it can assume that:

   o  the TCP client must have requested AccECN support on the SYN

   o  it (the server) must have confirmed that it supported AccECN





Briscoe, et al.         Expires December 1, 2017               [Page 24]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   Therefore the server can switch itself into AccECN mode, and continue
   as if it had never forgotten that it switched itself into AccECN mode
   earlier.  For other values of ACE field, heuristics to infer what
   other type of ECN the client supports are out of scope.

4.2.  Compatibility with Other TCP Options and Experiments

   AccECN is compatible (at least on paper) with the most commonly used
   TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO.  It is
   also compatible with the recent promising experimental TCP options
   TCP Fast Open (TFO [RFC7413]) and Multipath TCP (MPTCP [RFC6824]).
   AccECN is friendly to all these protocols, because space for TCP
   options is particularly scarce on the SYN, where AccECN consumes zero
   additional header space.

   When option space is under pressure from other options, Section 3.2.6
   provides guidance on how important it is to send an AccECN Option and
   whether it needs to be a full-length option.

4.3.  Compatibility with Feedback Integrity Mechanisms

   Three alternative mechanisms are available to assure the integrity of
   ECN and/or loss signals.  AccECN is compatible with any of these
   approaches:

   o  The Data Sender can test the integrity of the receiver's ECN (or
      loss) feedback by occasionally setting the IP-ECN field to a value
      normally only set by the network (and/or deliberately leaving a
      sequence number gap).  Then it can test whether the Data
      Receiver's feedback faithfully reports what it expects
      [I-D.moncaster-tcpm-rcv-cheat].  Unlike the ECN Nonce [RFC3540],
      this approach does not waste the ECT(1) codepoint in the IP
      header, it does not require standardisation and it does not rely
      on misbehaving receivers volunteering to reveal feedback
      information that allows them to be detected.  However, setting the
      CE mark by the sender might conceal actual congestion feedback
      from the network and should therefore only be done sparsely.

   o  Networks generate congestion signals when they are becoming
      congested, so networks are more likely than Data Senders to be
      concerned about the integrity of the receiver's feedback of these
      signals.  A network can enforce a congestion response to its ECN
      markings (or packet losses) using congestion exposure (ConEx)
      audit [RFC7713].  Whether the receiver or a downstream network is
      suppressing congestion feedback or the sender is unresponsive to
      the feedback, or both, ConEx audit can neutralise any advantage
      that any of these three parties would otherwise gain.




Briscoe, et al.         Expires December 1, 2017               [Page 25]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


      ConEx is a change to the Data Sender that is most useful when
      combined with AccECN.  Without AccECN, the ConEx behaviour of a
      Data Sender would have to be more conservative than would be
      necessary if it had the accurate feedback of AccECN.

   o  The TCP authentication option (TCP-AO [RFC5925]) can be used to
      detect any tampering with AccECN feedback between the Data
      Receiver and the Data Sender (whether malicious or accidental).
      The AccECN fields are immutable end-to-end, so they are amenable
      to TCP-AO protection, which covers TCP options by default.
      However, TCP-AO is often too brittle to use on many end-to-end
      paths, where middleboxes can make verification fail in their
      attempts to improve performance or security, e.g. by
      resegmentation or shifting the sequence space.

   Originally the ECN Nonce [RFC3540] was proposed to ensure integrity
   of congestion feedback.  With minor changes AccECN could be optimised
   for the possibility that the ECT(1) codepoint might be used as an ECN
   Nonce . However, given RFC 3540 is being reclassified as historic,
   the AccECN design has been generalised so that it ought to be able to
   support other possible uses of the ECT(1) codepoint, such as a lower
   severity or a more instant congestion signal than CE.

5.  Protocol Properties

   This section is informative not normative.  It describes how well the
   protocol satisfies the agreed requirements for a more accurate ECN
   feedback protocol [RFC7560].

   Accuracy:  From each ACK, the Data Sender can infer the number of new
      CE marked segments since the previous ACK.  This provides better
      accuracy on CE feedback than classic ECN.  In addition if the
      AccECN Option is present (not blocked by the network path) the
      number of bytes marked with CE, ECT(1) and ECT(0) are provided.

   Overhead:  The AccECN scheme is divided into two parts.  The
      essential part reuses the 3 flags already assigned to ECN in the
      IP header.  The supplementary part adds an additional TCP option
      consuming up to 11 bytes.  However, no TCP option is consumed in
      the SYN.

   Ordering:  The order in which marks arrive at the Data Receiver is
      preserved in AccECN feedback, because the Data Receiver is
      expected to send an ACK immediately whenever a different mark
      arrives.

   Timeliness:  While the same ECN markings are arriving continually at
      the Data Receiver, it can defer ACKs as TCP does normally, but it



Briscoe, et al.         Expires December 1, 2017               [Page 26]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


      will immediately send an ACK as soon as a different ECN marking
      arrives.

   Timeliness vs Overhead:  Change-Triggered ACKs are intended to enable
      latency-sensitive uses of ECN feedback by capturing the timing of
      transitions but not wasting resources while the state of the
      signalling system is stable.  The receiver can control how
      frequently it sends the AccECN TCP Option and therefore it can
      control the overhead induced by AccECN.

   Resilience:  All information is provided based on counters.
      Therefore if ACKs are lost, the counters on the first ACK
      following the losses allows the Data Sender to immediately recover
      the number of the ECN markings that it missed.

   Resilience against Bias:  Because feedback is based on repetition of
      counters, random losses do not remove any information, they only
      delay it.  Therefore, even though some ACKs are change-triggered,
      random losses will not alter the proportions of the different ECN
      markings in the feedback.

   Resilience vs Overhead:  If space is limited in some segments (e.g.
      because more option are need on some segments, such as the SACK
      option after loss), the Data Receiver can send AccECN Options less
      frequently or truncate fields that have not changed, usually down
      to as little as 5 bytes.  However, it has to send a full-sized
      AccECN Option at least three times per RTT, which the Data Sender
      can rely on as a regular beacon or checkpoint.

   Resilience vs Timeliness and Ordering:  Ordering information and the
      timing of transitions cannot be communicated in three cases: i)
      during ACK loss; ii) if something on the path strips the AccECN
      Option; or iii) if the Data Receiver is unable to support Change-
      Triggered ACKs.

   Complexity:  An AccECN implementation solely involves simple counter
      increments, some modulo arithmetic to communicate the least
      significant bits and allow for wrap, and some heuristics for
      safety against fields cycling due to prolonged periods of ACK
      loss.  Each host needs to maintain eight additional counters.  The
      hosts have to apply some additional tests to detect tampering by
      middleboxes, but in general the protocol is simple to understand,
      simple to implement and requires few cycles per packet to execute.

   Integrity:  AccECN is compatible with at least three approaches that
      can assure the integrity of ECN feedback.  If the AccECN Option is
      stripped the resolution of the feedback is degraded, but the
      integrity of this degraded feedback can still be assured.



Briscoe, et al.         Expires December 1, 2017               [Page 27]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   Backward Compatibility:  If only one endpoint supports the AccECN
      scheme, it will fall-back to the most advanced ECN feedback scheme
      supported by the other end.

   Backward Compatibility:  If the AccECN Option is stripped by a
      middlebox, AccECN still provides basic congestion feedback in the
      ACE field.  Further, AccECN can be used to detect mangling of the
      IP ECN field; mangling of the TCP ECN flags; blocking of ECT-
      marked segments; and blocking of segments carrying the AccECN
      Option.  It can detect these conditions during TCP's 3WHS so that
      it can fall back to operation without ECN and/or operation without
      the AccECN Option.

   Forward Compatibility:  The behaviour of endpoints and middleboxes is
      carefully defined for all reserved or currently unused codepoints
      in the scheme, to ensure that any blocking of anomalous values is
      always at least under reversible policy control.

6.  IANA Considerations

   This document reassigns bit 7 of the TCP header flags to the AccECN
   experiment.  This bit was previously called the Nonce Sum (NS) flag
   [RFC3540], but RFC 3540 is being reclassified as historic.  The flag
   will now be defined as:

                  +-----+-------------------+-----------+
                  | Bit | Name              | Reference |
                  +-----+-------------------+-----------+
                  | 7   | AE (Accurate ECN) | RFC XXXX  |
                  +-----+-------------------+-----------+

   [TO BE REMOVED: This registration should take place at the following
   location: https://www.iana.org/assignments/tcp-header-flags/tcp-
   header-flags.xhtml#tcp-header-flags-1 ]

   This document also defines a new TCP option for AccECN, assigned a
   value of TBD1 (decimal) from the TCP option space.  This value is
   defined as:

           +------+--------+-----------------------+-----------+
           | Kind | Length | Meaning               | Reference |
           +------+--------+-----------------------+-----------+
           | TBD1 | N      | Accurate ECN (AccECN) | RFC XXXX  |
           +------+--------+-----------------------+-----------+

   [TO BE REMOVED: This registration should take place at the following
   location: http://www.iana.org/assignments/tcp-parameters/tcp-
   parameters.xhtml#tcp-parameters-1 ]



Briscoe, et al.         Expires December 1, 2017               [Page 28]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   Early implementation before the IANA allocation MUST follow [RFC6994]
   and use experimental option 254 and magic number 0xACCE (16 bits),
   then migrate to the new option after the allocation.

7.  Security Considerations

   If ever the supplementary part of AccECN based on the new AccECN TCP
   Option is unusable (due for example to middlebox interference) the
   essential part of AccECN's congestion feedback offers only limited
   resilience to long runs of ACK loss (see Section 3.2.3).  These
   problems are unlikely to be due to malicious intervention (because if
   an attacker could strip a TCP option or discard a long run of ACKs it
   could wreak other arbitrary havoc).  However, it would be of concern
   if AccECN's resilience could be indirectly compromised during a
   flooding attack.  AccECN is still considered safe though, because if
   the option is not presented, the AccECN Data Sender is then required
   to switch to more conservative assumptions about wrap of congestion
   indication counters (see Section 3.2.3 and Appendix A.2).

   Section 4.1 describes how a TCP server can negotiate AccECN and use
   the SYN cookie method for mitigating SYN flooding attacks.

   There is concern that ECN markings could be altered or suppressed,
   particularly because a misbehaving Data Receiver could increase its
   own throughput at the expense of others.  AccECN is compatible with
   the three schemes known to assure the integrity of ECN feedback (see
   Section 4.3 for details).  If the AccECN Option is stripped by an
   incorrectly implemented middlebox, the resolution of the feedback
   will be degraded, but the integrity of this degraded information can
   still be assured.

   The AccECN protocol is not believed to introduce any new privacy
   concerns, because it merely counts and feeds back signals at the
   transport layer that had already been visible at the IP layer.

8.  Acknowledgements

   We want to thank Koen De Schepper, Praveen Balasubramanian and
   Michael Welzl for their input and discussion.  The idea of using the
   three ECN-related TCP flags as one field for more accurate TCP-ECN
   feedback was first introduced in the re-ECN protocol that was the
   ancestor of ConEx.

   Bob Briscoe was part-funded by the European Community under its
   Seventh Framework Programme through the Reducing Internet Transport
   Latency (RITE) project (ICT-317700) and through the Trilogy 2 project
   (ICT-317756).  The views expressed here are solely those of the
   authors.



Briscoe, et al.         Expires December 1, 2017               [Page 29]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   This work is partly supported by the European Commission under
   Horizon 2020 grant agreement no. 688421 Measurement and Architecture
   for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat
   for Education, Research, and Innovation under contract no. 15.0268.
   This support does not imply endorsement.

9.  Comments Solicited

   Comments and questions are encouraged and very welcome.  They can be
   addressed to the IETF TCP maintenance and minor modifications working
   group mailing list <tcpm@ietf.org>, and/or to the authors.

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <http://www.rfc-editor.org/info/rfc3168>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <http://www.rfc-editor.org/info/rfc5681>.

   [RFC6994]  Touch, J., "Shared Use of Experimental TCP Options",
              RFC 6994, DOI 10.17487/RFC6994, August 2013,
              <http://www.rfc-editor.org/info/rfc6994>.

10.2.  Informative References

   [I-D.bagnulo-tcpm-generalized-ecn]
              Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
              Congestion Notification (ECN) to TCP Control Packets",
              draft-bagnulo-tcpm-generalized-ecn-04 (work in progress),
              May 2017.

   [I-D.ietf-tcpm-alternativebackoff-ecn]
              Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
              "TCP Alternative Backoff with ECN (ABE)", draft-ietf-tcpm-
              alternativebackoff-ecn-01 (work in progress), May 2017.





Briscoe, et al.         Expires December 1, 2017               [Page 30]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   [I-D.ietf-tcpm-dctcp]
              Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
              and G. Judd, "Datacenter TCP (DCTCP): TCP Congestion
              Control for Datacenters", draft-ietf-tcpm-dctcp-06 (work
              in progress), May 2017.

   [I-D.ietf-tsvwg-ecn-experimentation]
              Black, D., "Explicit Congestion Notification (ECN)
              Experimentation", draft-ietf-tsvwg-ecn-experimentation-02
              (work in progress), April 2017.

   [I-D.ietf-tsvwg-l4s-arch]
              Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
              Low Loss, Scalable Throughput (L4S) Internet Service:
              Architecture", draft-ietf-tsvwg-l4s-arch-00 (work in
              progress), May 2017.

   [I-D.kuehlewind-tcpm-ecn-fallback]
              Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path
              Probing and Fallback", draft-kuehlewind-tcpm-ecn-
              fallback-01 (work in progress), September 2013.

   [I-D.moncaster-tcpm-rcv-cheat]
              Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
              Allow Senders to Identify Receiver Non-Compliance", draft-
              moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014.

   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
              Congestion Notification (ECN) Signaling with Nonces",
              RFC 3540, DOI 10.17487/RFC3540, June 2003,
              <http://www.rfc-editor.org/info/rfc3540>.

   [RFC4987]  Eddy, W., "TCP SYN Flooding Attacks and Common
              Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
              <http://www.rfc-editor.org/info/rfc4987>.

   [RFC5562]  Kuzmanovic, A., Mondal, A., Floyd, S., and K.
              Ramakrishnan, "Adding Explicit Congestion Notification
              (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
              DOI 10.17487/RFC5562, June 2009,
              <http://www.rfc-editor.org/info/rfc5562>.

   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
              Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
              June 2010, <http://www.rfc-editor.org/info/rfc5925>.






Briscoe, et al.         Expires December 1, 2017               [Page 31]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   [RFC6824]  Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
              "TCP Extensions for Multipath Operation with Multiple
              Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
              <http://www.rfc-editor.org/info/rfc6824>.

   [RFC7413]  Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
              Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
              <http://www.rfc-editor.org/info/rfc7413>.

   [RFC7560]  Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe,
              "Problem Statement and Requirements for Increased Accuracy
              in Explicit Congestion Notification (ECN) Feedback",
              RFC 7560, DOI 10.17487/RFC7560, August 2015,
              <http://www.rfc-editor.org/info/rfc7560>.

   [RFC7713]  Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
              Concepts, Abstract Mechanism, and Requirements", RFC 7713,
              DOI 10.17487/RFC7713, December 2015,
              <http://www.rfc-editor.org/info/rfc7713>.
































Briscoe, et al.         Expires December 1, 2017               [Page 32]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


Appendix A.  Example Algorithms

   This appendix is informative, not normative.  It gives example
   algorithms that would satisfy the normative requirements of the
   AccECN protocol.  However, implementers are free to choose other ways
   to implement the requirements.

A.1.  Example Algorithm to Encode/Decode the AccECN Option

   The example algorithms below show how a Data Receiver in AccECN mode
   could encode its CE byte counter r.ceb into the ECEB field within the
   AccECN TCP Option, and how a Data Sender in AccECN mode could decode
   the ECEB field into its byte counter s.ceb.  The other counters for
   bytes marked ECT(0) and ECT(1) in the AccECN Option would be
   similarly encoded and decoded.

   It is assumed that each local byte counter is an unsigned integer
   greater than 24b (probably 32b), and that the following constant has
   been assigned:

      DIVOPT = 2^24

   Every time a CE marked data segment arrives, the Data Receiver
   increments its local value of r.ceb by the size of the TCP Data.
   Whenever it sends an ACK with the AccECN Option, the value it writes
   into the ECEB field is

      ECEB = r.ceb % DIVOPT

   where '%' is the modulo operator.

   On the arrival of an AccECN Option, the Data Sender uses the TCP
   acknowledgement number and any SACK options to calculate newlyAckedB,
   the amount of new data that the ACK acknowledges in bytes.  If
   newlyAckedB is negative it means that a more up to date ACK has
   already been processed, so this ACK has been superseded and the Data
   Sender has to ignore the AccECN Option.  Then the Data Sender
   calculates the minimum difference d.ceb between the ECEB field and
   its local s.ceb counter, using modulo arithmetic as follows:

      if (newlyAckedB >= 0) {
          d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT
          s.ceb += d.ceb
      }

   For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal),
   then




Briscoe, et al.         Expires December 1, 2017               [Page 33]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


      s.ceb % DIVOPT = 1
            d.ceb = (1461 + 2^24 - 1) % 2^24
                  = 1460
            s.ceb = 33,554,433 + 1460
                  = 33,555,893

A.2.  Example Algorithm for Safety Against Long Sequences of ACK Loss

   The example algorithms below show how a Data Receiver in AccECN mode
   could encode its CE packet counter r.cep into the ACE field, and how
   the Data Sender in AccECN mode could decode the ACE field into its
   s.cep counter.  The Data Sender's algorithm includes code to
   heuristically detect a long enough unbroken string of ACK losses that
   could have concealed a cycle of the congestion counter in the ACE
   field of the next ACK to arrive.

   Two variants of the algorithm are given: i) a more conservative
   variant for a Data Sender to use if it detects that the AccECN Option
   is not available (see Section 3.2.3 and Section 3.2.5); and ii) a
   less conservative variant that is feasible when complementary
   information is available from the AccECN Option.

A.2.1.  Safety Algorithm without the AccECN Option

   It is assumed that each local packet counter is a sufficiently sized
   unsigned integer (probably 32b) and that the following constant has
   been assigned:

      DIVACE = 2^3

   Every time a CE marked packet arrives, the Data Receiver increments
   its local value of r.cep by 1.  It repeats the same value of ACE in
   every subsequent ACK until the next CE marking arrives, where

      ACE = r.cep % DIVACE.

   If the Data Sender received an earlier value of the counter that had
   been delayed due to ACK reordering, it might incorrectly calculate
   that the ACE field had wrapped.  Therefore, on the arrival of every
   ACK, the Data Sender uses the TCP acknowledgement number and any SACK
   options to calculate newlyAckedB, the amount of new data that the ACK
   acknowledges.  If newlyAckedB is negative it means that a more up to
   date ACK has already been processed, so this ACK has been superseded
   and the Data Sender has to ignore the AccECN Option.  If newlyAckedB
   is zero, to break the tie the Data Sender could use timestamps (if
   present) to work out newlyAckedT, the amount of new time that the ACK
   acknowledges.  Then the Data Sender calculates the minimum difference




Briscoe, et al.         Expires December 1, 2017               [Page 34]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   d.cep between the ACE field and its local s.cep counter, using modulo
   arithmetic as follows:

      if ((newlyAckedB > 0) || (newlyAckedB == 0 && newlyAckedT > 0))
          d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE

   Section 3.2.3 requires the Data Sender to assume that the ACE field
   did cycle if it could have cycled under prevailing conditions.  The
   3-bit ACE field in an arriving ACK could have cycled and become
   ambiguous to the Data Sender if a row of ACKs goes missing that
   covers a stream of data long enough to contain 8 or more CE marks.
   We use the word `missing' rather than `lost', because some or all the
   missing ACKs might arrive eventually, but out of order.  Even if some
   of the lost ACKs are piggy-backed on data (i.e. not pure ACKs)
   retransmissions will not repair the lost AccECN information, because
   AccECN requires retransmissions to carry the latest AccECN counters,
   not the original ones.

   The phrase `under prevailing conditions' allows the Data Sender to
   take account of the prevailing size of data segments and the
   prevailing CE marking rate just before the sequence of ACK losses.
   However, we shall start with the simplest algorithm, which assumes
   segments are all full-sized and ultra-conservatively it assumes that
   ECN marking was 100% on the forward path when ACKs on the reverse
   path started to all be dropped.  Specifically, if newlyAckedB is the
   amount of data that an ACK acknowledges since the previous ACK, then
   the Data Sender could assume that this acknowledges newlyAckedPkt
   full-sized segments, where newlyAckedPkt = newlyAckedB/MSS.  Then it
   could assume that the ACE field incremented by

       dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE),

   For example, imagine an ACK acknowledges newlyAckedPkt=9 more full-
   size segments than any previous ACK, and that ACE increments by a
   minimum of 2 CE marks (d.cep=2).  The above formula works out that it
   would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) =
   2).  However, if ACE increases by a minimum of 2 but acknowledges 10
   full-sized segments, then it would be necessary to assume that there
   could have been 10 CE marks (because 10 - ((10-2) % 8) = 10).

   Implementers could build in more heuristics to estimate prevailing
   average segment size and prevailing ECN marking.  For instance,
   newlyAckedPkt in the above formula could be replaced with
   newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing
   segment size and p is the prevailing ECN marking probability.
   However, ultimately, if TCP's ECN feedback becomes inaccurate it
   still has loss detection to fall back on.  Therefore, it would seem
   safe to implement a simple algorithm, rather than a perfect one.



Briscoe, et al.         Expires December 1, 2017               [Page 35]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   The simple algorithm for dSafer.cep above requires no monitoring of
   prevailing conditions and it would still be safe if, for example,
   segments were on average at least 5% of full-sized as long as ECN
   marking was 5% or less.  Assuming it was used, the Data Sender would
   increment its packet counter as follows:

      s.cep += dSafer.cep

   If missing acknowledgement numbers arrive later (due to reordering),
   Section 3.2.3 says "the Data Sender MAY attempt to neutralise the
   effect of any action it took based on a conservative assumption that
   it later found to be incorrect".  To do this, the Data Sender would
   have to store the values of all the relevant variables whenever it
   made assumptions, so that it could re-evaluate them later.  Given
   this could become complex and it is not required, we do not attempt
   to provide an example of how to do this.

A.2.2.  Safety Algorithm with the AccECN Option

   When the AccECN Option is available on the ACKs before and after the
   possible sequence of ACK losses, if the Data Sender only needs CE-
   marked bytes, it will have sufficient information in the AccECN
   Option without needing to process the ACE field.  However, if for
   some reason it needs CE-marked packets, if dSafer.cep is different
   from d.cep, it can calculate the average marked segment size that
   each implies to determine whether d.cep is likely to be a safe enough
   estimate.  Specifically, it could use the following algorithm, where
   d.ceb is the amount of newly CE-marked bytes (see Appendix A.1):

      SAFETY_FACTOR = 2
      if (dSafer.cep > d.cep) {
          s = d.ceb/d.cep
          if (s <= MSS) {
             sSafer = d.ceb/dSafer.cep
             if (sSafer < MSS/SAFETY_FACTOR)
                 dSafer.cep = d.cep    % d.cep is a safe enough estimate
          } % else
              % No need for else; dSafer.cep is already correct,
              % because d.cep must have been too small
      }

   The chart below shows when the above algorithm will consider d.cep
   can replace dSafer.cep as a safe enough estimate of the number of CE-
   marked packets:







Briscoe, et al.         Expires December 1, 2017               [Page 36]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


         ^
   sSafer|
         |
      MSS+
         |
         |         dSafer.cep
         |                  is
    MSS/2+--------------+    safest
         |              |
         | d.cep is safe|
         |    enough    |
         +-------------------->
                       MSS   s


   The following examples give the reasoning behind the algorithm,
   assuming MSS=1,460 [B]:

   o  if d.cep=0, dSafer.cep=8 and d.ceb=1,460, then s=infinity and
      sSafer=182.5.
      Therefore even though the average size of 8 data segments is
      unlikely to have been as small as MSS/8, d.cep cannot have been
      correct, because it would imply an average segment size greater
      than the MSS.

   o  if d.cep=2, dSafer.cep=10 and d.ceb=1,460, then s=730 and
      sSafer=146.
      Therefore d.cep is safe enough, because the average size of 10
      data segments is unlikely to have been as small as MSS/10.

   o  if d.cep=7, dSafer.cep=15 and d.ceb=10,200, then s=1,457 and
      sSafer=680.
      Therefore d.cep is safe enough, because the average data segment
      size is more likely to have been just less than one MSS, rather
      than below MSS/2.

   If pure ACKs were allowed to be ECN-capable, missing ACKs would be
   far less likely.  However, because [RFC3168] currently precludes
   this, the above algorithm assumes that pure ACKs are not ECN-capable.

A.3.  Example Algorithm to Estimate Marked Bytes from Marked Packets

   If the AccECN Option is not available, the Data Sender can only
   decode CE-marking from the ACE field in packets.  Every time an ACK
   arrives, to convert this into an estimate of CE-marked bytes, it
   needs an average of the segment size, s_ave.  Then it can add or
   subtract s_ave from the value of d.ceb as the value of d.cep
   increments or decrements.



Briscoe, et al.         Expires December 1, 2017               [Page 37]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   To calculate s_ave, it could keep a record of the byte numbers of all
   the boundaries between packets in flight (including control packets),
   and recalculate s_ave on every ACK.  However it would be simpler to
   merely maintain a counter packets_in_flight for the number of packets
   in flight (including control packets), which it could update once per
   RTT.  Either way, it would estimate s_ave as:

      s_ave ~= flightsize / packets_in_flight,

   where flightsize is the variable that TCP already maintains for the
   number of bytes in flight.  To avoid floating point arithmetic, it
   could right-bit-shift by lg(packets_in_flight), where lg() means log
   base 2.

   An alternative would be to maintain an exponentially weighted moving
   average (EWMA) of the segment size:

      s_ave = a * s + (1-a) * s_ave,

   where a is the decay constant for the EWMA.  However, then it is
   necessary to choose a good value for this constant, which ought to
   depend on the number of packets in flight.  Also the decay constant
   needs to be power of two to avoid floating point arithmetic.

A.4.  Example Algorithm to Beacon AccECN Options

   Section 3.2.6 requires a Data Receiver to beacon a full-length AccECN
   Option at least 3 times per RTT.  This could be implemented by
   maintaining a variable to store the number of ACKs (pure and data
   ACKs) since a full AccECN Option was last sent and another for the
   approximate number of ACKs sent in the last round trip time:

      if (acks_since_full_last_sent > acks_in_round / BEACON_FREQ)
          send_full_AccECN_Option()

   For optimised integer arithmetic, BEACON_FREQ = 4 could be used,
   rather than 3, so that the division could be implemented as an
   integer right bit-shift by lg(BEACON_FREQ).

   In certain operating systems, it might be too complex to maintain
   acks_in_round.  In others it might be possible by tagging each data
   segment in the retransmit buffer with the number of ACKs sent at the
   point that segment was sent.  This would not work well if the Data
   Receiver was not sending data itself, in which case it might be
   necessary to beacon based on time instead, as follows:

      if ( time_now > time_last_option_sent + (RTT / BEACON_FREQ) )
          send_full_AccECN_Option()



Briscoe, et al.         Expires December 1, 2017               [Page 38]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   This time-based approach does not work well when all the ACKs are
   sent early in each round trip, as is the case during slow-start.  In
   this case few options will be sent (evtl. even less than 3 per RTT).
   However, when continuously sending data, data packets as well as ACKs
   will spread out equally over the RTT and sufficient ACKs with the
   AccECN option will be sent.

A.5.  Example Algorithm to Count Not-ECT Bytes

   A Data Sender in AccECN mode can infer the amount of TCP payload data
   arriving at the receiver marked Not-ECT from the difference between
   the amount of newly ACKed data and the sum of the bytes with the
   other three markings, d.ceb, d.e0b and d.e1b.  Note that, because
   r.e0b is initialized to 1 and the other two counters are initialized
   to 0, the initial sum will be 1, which matches the initial offset of
   the TCP sequence number on completion of the 3WHS.

   For this approach to be precise, it has to be assumed that spurious
   (unnecessary) retransmissions do not lead to double counting.  This
   assumption is currently correct, given that RFC 3168 requires that
   the Data Sender marks retransmitted segments as Not-ECT.  However,
   the converse is not true; necessary transmissions will result in
   under-counting.

   However, such precision is unlikely to be necessary.  The only known
   use of a count of Not-ECT marked bytes is to test whether equipment
   on the path is clearing the ECN field (perhaps due to an out-dated
   attempt to clear, or bleach, what used to be the ToS field).  To
   detect bleaching it will be sufficient to detect whether nearly all
   bytes arrive marked as Not-ECT.  Therefore there should be no need to
   keep track of the details of retransmissions.

Appendix B.  Alternative Design Choices (To Be Removed Before
             Publication)

   This appendix is informative, not normative.  It records alternative
   designs that the authors chose not to include in the normative
   specification, but which the IETF might wish to consider for
   inclusion:

   Feedback all four ECN codepoints on the SYN/ACK:  The last two
      negotiation combinations in Table 2 could be used to indicate
      AccECN support while also feeding back that the arriving SYN was
      ECT(0) or ECT(1).  This could be used to probe the client to
      server path for incorrect forwarding of the ECN field
      [I-D.kuehlewind-tcpm-ecn-fallback].





Briscoe, et al.         Expires December 1, 2017               [Page 39]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   Feedback all four ECN codepoints on the First ACK:  To probe the
      server to client path for incorrect ECN forwarding, it could be
      useful to have four feedback states on the first ACK from the TCP
      client.  This could be achieved by assigning four combinations of
      the ECN flags in the main TCP header, and only initializing the
      ACE field on subsequent segments.

Appendix C.  Open Protocol Design Issues (To Be Removed Before
             Publication)

   1.  Currently it is specified that the receiver `SHOULD' use Change-
       Triggered ACKs.  It is controversial whether this ought to be a
       `MUST' instead.  A `SHOULD' would leave the Data Sender uncertain
       whether it can rely on the timing and ordering information in
       ACKs.  If the sender guesses wrongly, it will probably introduce
       at least 1 RTT of delay before it can use this timing
       information.  Ironically it will most likely be wanting this
       information to reduce ramp-up delay.  A `MUST' could make it hard
       to implement AccECN in offload hardware.  However, it is not
       known whether AccECN would be hard to implement in such hardware
       even with a `SHOULD' here.  For instance, was it hard to offload
       DCTCP to hardware because of change-triggered ACKs, or was this
       just one of many reasons?  The choice between MUST and SHOULD
       here is critical.  Before that choice is made, a clear use-case
       for certainty of timing and ordering information is needed, plus
       well-informed discussion about hardware offload constraints.

   2.  There is possibly a concern that a receiver could deliberately
       omit the AccECN Option pretending that it had been stripped by a
       middlebox.  No known way can yet be contrived to take advantage
       of this downgrade attack, but it is mentioned here in case
       someone else can contrive one.

Appendix D.  Changes in This Version (To Be Removed Before Publication)

   The difference between any pair of versions can be displayed at
   http://datatracker.ietf.org/doc/draft-kuehlewind-tcpm-accurate-ecn/
   history/

Authors' Addresses

   Bob Briscoe
   Simula Research Laboratory

   EMail: ietf@bobbriscoe.net
   URI:   http://bobbriscoe.net/





Briscoe, et al.         Expires December 1, 2017               [Page 40]


Internet-Draft          Accurate TCP-ECN Feedback               May 2017


   Mirja Kuehlewind
   ETH Zurich
   Zurich
   Switzerland

   EMail: mirja.kuehlewind@tik.ee.ethz.ch


   Richard Scheffenegger
   Vienna
   Austria

   EMail: rscheff@gmx.at






































Briscoe, et al.         Expires December 1, 2017               [Page 41]