TCP Maintenance and Minor Extensions M. Kuehlewind, Ed.
(tcpm) University of Stuttgart
Internet-Draft R. Scheffenegger
Intended status: Experimental NetApp, Inc.
Expires: January 17, 2013 July 16, 2012
More Accurate ECN Feedback in TCP
draft-kuehlewind-tcpm-accurate-ecn-01
Abstract
Explicit Congestion Notification (ECN) is an IP/TCP mechanism where
network nodes can mark IP packets instead of dropping them to
indicate congestion to the end-points. An ECN-capable receiver will
feedback this information to the sender. ECN is specified for TCP in
such a way that only one feedback signal can be transmitted per
Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or
DCTCP need more accurate ECN feedback information in the case where
more than one marking is received in one RTT. This documents
specifies a different scheme for the ECN feedback in the TCP header
to provide more than one feedback signal per RTT.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 17, 2013.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 1]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Overview ECN and ECN Nonce in IP/TCP . . . . . . . . . . . 4
1.3. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
1.4. Design choices . . . . . . . . . . . . . . . . . . . . . . 6
1.5. Requirements Language . . . . . . . . . . . . . . . . . . 7
2. Negotiation during the TCP handshake . . . . . . . . . . . . . 7
3. More Accurate ECN Feedback . . . . . . . . . . . . . . . . . . 9
3.1. Codepoint Coding . . . . . . . . . . . . . . . . . . . . . 9
3.2. More Accurate ECN TCP Sender . . . . . . . . . . . . . . . 10
3.3. More Accurate ECN TCP Receiver . . . . . . . . . . . . . . 11
3.3.1. Implementation . . . . . . . . . . . . . . . . . . . . 11
3.4. Advanced Compatibility Mode . . . . . . . . . . . . . . . 12
4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
6. Security Considerations . . . . . . . . . . . . . . . . . . . 14
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.1. Normative References . . . . . . . . . . . . . . . . . . . 14
7.2. Informative References . . . . . . . . . . . . . . . . . . 15
Appendix A. Estimating CE-marked bytes . . . . . . . . . . . . . 15
Appendix B. Use with ECN Nonce . . . . . . . . . . . . . . . . . 15
B.1. Pseudo Code for the Codepoint Coding . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 2]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP
mechanism where network nodes can mark IP packets instead of dropping
them to indicate congestion to the end-points. An ECN-capable
receiver will feedback this information to the sender. ECN is
specified for TCP in such a way that only one feedback signal can be
transmitted per Round-Trip Time (RTT). Recently, proposed mechanisms
like Congestion Exposure (ConEx) or DCTCP [Ali10] need more accurate
ECN feedback information in case when more than one marking is
received in one RTT.
This documents specifies a different scheme for the ECN feedback in
the TCP header to provide more than one feedback signal per RTT.
This modification does not obsolete [RFC3168]. To avoid confusion we
call the ECN specification of [RFC3168] 'classic ECN' in this
document. This document provides an extension that requires
additional negotiation in the TCP handshake by using the TCP nonce
sum (NS) bit, as specified in [RFC3540], which is currently not used
when SYN is set. If the more accurate ECN extension has been
negotiated successfully, the meaning of ECN TCP bits and the ECN NS
bit is different from the specification in [RFC3168] and [RFC3540].
This document specifies the additional negotiation as well as the new
coding of the TCP ECN/NS bits.
The proposed coding scheme maintains the given bit space as the ECN
feedback information is needed in a timely manner and as such should
be reported in every ACK. The reuse will avoid additional network
load as the ACK size will not increase. Moreover, the more accurate
ECN information will replace the classic ECN feedback if negotiated.
Thus those bits are not needed otherwise. But the proposed schemes
requires also the use of the NS bit in the TCP handshake as well as
for the more accurate ECN feedback itself. The proposed more
accurate ECN feedback extension can include the ECN-Nonce integrity
mechanism as some coding space is left open. The use of ECN-Nonce is
not part of the specification in this document but is discussed in
the appendix.
1.1. Use Cases
The following scenarios should briefly show where the accurate
feedback is needed or provides additional value:
A Standard (RFC5681) TCP sender that supports ConEx:
In this case the congestion control algorithm still ignores
multiple marks per RTT, while the ConEx mechanism uses the
extra information per RTT to re-echo more precise congestion
information.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 3]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
A sender using DCTCP congestion control without ConEx:
The congestion control algorithm uses the extra info per RTT
to perform its decrease depending on the number of congestion
marks.
A sender using DCTCP congestion control and supports ConEx:
Both the congestion control algorithm and ConEx use the
accurate ECN feedback mechanism.
A standard TCP sender (using RFC5681 congestion control algorithm)
without ConEx:
No accurate feedback is necessary here. The congestion
control algorithm still react only on one signal per RTT.
But it is best to have one generic feedback mechanism,
whether it is used or not.
1.2. Overview ECN and ECN Nonce in IP/TCP
ECN requires two bits in the IP header. The ECN capability of a
packet is indicated when either one of the two bits is set. An ECN
sender can set one or the other bit to indicate an ECN-capable
transport (ECT) which results in two signals, ECT(0) and ECT(1). A
network node can set both bits simultaneously when it experiences
congestion. When both bits are set the packet is regarded as
"Congestion Experienced" (CE).
In the TCP header the first two bits in byte 14 are defined for the
use of ECN. The TCP mechanism for signaling the reception of a
congestion mark uses the ECN-Echo (ECE) flag in the TCP header. To
enable the TCP receiver to determine when to stop setting the ECN-
Echo flag, the CWR flag is set by the sender upon reception of the
feedback signal. This leads always to a full RTT of ACKs with ECE
set. Thus any additional CE markings arriving within this RTT can
not signaled back anymore.
ECN-Nonce [RFC3540] is an optional addition to ECN that is used to
protect the TCP sender against accidental or malicious concealment of
marked or dropped packets. This addition defines the last bit of
byte 13 in the TCP header as the Nonce Sum (NS) bit. With ECN-Nonce
a nonce sum is maintain that counts the occurrence of ECT(1) packets.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 4]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 1: The (post-ECN Nonce) definition of the TCP header flags
1.3. Requirements
The requirements of the accurate ECN feedback protocol for the use of
e.g. Conex or DCTCP are to have a fairly accurate (not necessarily
perfect), timely and protected signaling. This leads to the
following requirements:
Resilience
The ECN feedback signal is carried within the TCP
acknowledgment. TCP ACKs can get lost. Moreover, delayed
ACK are mostly used with TCP. That means in most cases only
every second data packets triggers an ACK. In a high
congestion situation where most of the packet are marked with
CE, an accurate feedback mechanism must still be able to
signal sufficient congestion information. Thus the accurate
ECN feedback extension has to take delayed ACK and ACK loss
into account.
Timely
The CE marking is induced by a network node on the
transmission path and echoed by the receiver in the TCP
acknowledgment. Thus when this information arrives at the
sender, its naturally already about one RTT old. With a
sufficient ACK rate a further delay of a small number of ACK
can be tolerated but with large delays this information will
be out dated due to high dynamic in the network. TCP
congestion control which introduces parts of these dynamics
operates on a time scale of one RTT. Thus the congestion
feedback information should be delivered timely (within one
RTT).
Integrity
With ECN Nonce, a misbehaving receiver or network node can be
detected with a certain probability. As this accurate ECN
feedback is reusing the NS bit, it is encouraged to ensure
integrity as least as good as ECN Nonce. If this is not
possible, alternative approaches should be provided how a
mechanism using the accurate ECN feedback extension can re-
ensure integrity or give strong incentives for the receiver
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 5]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
and network node to cooperate honestly.
Accuracy
Classic ECN feeds back one congestion notification per RTT,
as this is supposed to be used for TCP congestion control
which reduces the sending rate at most once per RTT. The
accurate ECN feedback scheme has to ensure that if a
congestion events occurs at least one congestion notification
is echoed and received per RTT as classic ECN would do. Of
course, the goal of this extension is to reconstruct the
number of CE marking more accurately. However, a sender
should not assume to get the exact number of congestion
marking in all situations.
Complexity
Of course, the more accurate ECN feedback can also be used,
even if only one ECN feedback signal per RTT is need. The
implementation should be as simple as possible and only a
minimum of addition state information should be needed. A
proposal fulfilling this for a more accurate ECN feedback can
then also be the standard ECN feedback mechanism.
1.4. Design choices
The idea of this document is to use the ECE, CWR and NS bits for
additional capability negotiation during the <SYN> / <SYN,ACK>
exchange, and then for the more accurate ECN feedback itself on
subsequent packets in the flow (where SYN is not set).
Alternatively, a new TCP option could be introduced, to help maintain
the accuracy, and integrity of the ECN feedback between receiver and
sender. Such an option could provide more information. E.g. ECN
for RTP/UDP provides explicit the number of ECT(0), ECT(1), CE, non-
ECT marked and lost packets. However, deploying new TCP options has
its own challenges. A separate document proposes a new TCP Option
for accurate ECN feedback
[draft-kuehlewind-tcpm-accurate-ecn-option]. This option could be
used in addition to a more accurate ECN feedback scheme described
here or in addition to classic ECN, when available and needed.
As seen in Figure 1, there are currently three unused flag bits in
the TCP header. The proposed scheme could be extended by one or more
bits, to add higher resiliency against ACK loss. The relative gain
would be proportionally higher resiliency against ACK loss, while the
respective drawbacks would remain identical. Thus the approach in
this document is to maintain the scope of the given number of header
bits as they seem to be already sufficient. This accurate ECN
feedback scheme will only be used instead of the classic ECN and
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 6]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
never in parallel.
1.5. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
We use the following terminology from [RFC3168] and [RFC3540]:
The ECN field in the IP header:
CE: the Congestion Experienced codepoint, and
ECT(0): the first ECN-Capable Transport codepoint, and
ECT(1): the second ECN-Capable Transport codepoint.
The ECN flags in the TCP header:
CWR: the Congestion Window Reduced flag,
ECE: the ECN-Echo flag, and
NS: ECN Nonce Sum.
In this document, we will call the ECN feedback scheme as specified
in [RFC3168] the 'classic ECN' and our new proposal the 'more
accurate ECN feedback' scheme. A 'congestion mark' is defined as an
IP packet where the CE codepoint is set. A 'congestion event' refers
to one or more congestion marks belong to the same overload situation
in the network (usually during one RTT).
2. Negotiation during the TCP handshake
During the TCP hand-shake at the start of a connection, an originator
of the connection (host A) MUST indicate a request to get more
accurate ECN feedback by setting the TCP flags NS=1, CWR=1 and ECE=1
in the initial <SYN>.
A responding host (host B) MUST return a <SYN,ACK> with flags CWR=1
and ECE=0. The responding host MUST NOT set this combination of
flags unless the preceding <SYN> has already requested support for
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 7]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
more accurate ECN feedback as above. Normally a server (B) will
reply to a client with NS=0, but if the initial <SYN> from client A
is marked CE, the sever B SHOULD set the NS flag to 1 to indicate the
congestion immediately instead of delaying the signal to the first
acknowledgment when the actually data transmission already started.
So, server B MAY set the alternative TCP header flags in its
<SYN,ACK>: NS=1, CWR=1 and ECE=0.
The addition of ECN to TCP <SYN,ACK> packets is discussed and
specified as experimental in [RFC5562]. The addition of ECN to the
<SYN> packet is optional. The security implication when using this
option are not further discussed here.
This handshake is summarized in Table 1 below, with X indicating NS
can be either 0 or 1 depending on whether congestion had been
experienced. The handshakes used for the other flavors of ECN are
also shown for comparison. To compress the width of the table, the
headings of the first four columns have been severely abbreviated, as
follows:
Ac: *Ac*curate ECN Feedback
N: ECN-*N*once (RFC3540)
E: *E*CN (RFC3168)
I: Not-ECN (*I*mplicit congestion notification).
+----+---+---+---+------------+----------------+------------------+
| Ac | N | E | I | <SYN> A->B | <SYN,ACK> B->A | Mode |
+----+---+---+---+------------+----------------+------------------+
| | | | | NS CWR ECE | NS CWR ECE | |
| AB | | | | 1 1 1 | X 1 0 | accurate ECN |
| A | B | | | 1 1 1 | 1 0 1 | ECN Nonce |
| A | | B | | 1 1 1 | 0 0 1 | classic ECN |
| A | | | B | 1 1 1 | 0 0 0 | Not ECN |
| A | | | B | 1 1 1 | X 1 1 | Not ECN (broken) |
+----+---+---+---+------------+----------------+------------------+
Table 1: ECN capability negotiation between Sender (A) and
Receiver (B)
Recall that, if the <SYN,ACK> reflects the same flag settings as the
preceding <SYN> (because there is a broken TCP implementation that
behaves this way), RFC3168 specifies that the whole connection MUST
revert to Not-ECT.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 8]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
3. More Accurate ECN Feedback
In this section we refer the sender to be the one sending data and
the receiver as the one that will acknowledge this data. Of course
such a scenario is describing only one half connection of a TCP
connection. The proposed scheme, if negotiated, will be used for
both half connection as both, sender and receiver, need to be capable
to echo and understand the accurate ECN feedback scheme.
This section proposes the new coding of the two ECN TCP bits (ECE/
CWR) as well as the TCP NS bit to provide a more accurate ECN
feedback. This coding MUST only be used if the more accurate ECN
feedback has been negotiated successfully in the TCP handshake.
Section Section 3.4 provides basically another alternative to allow a
compatibility mode when a sender needs more accurate ECN feedback but
has to operate with a legacy [RFC3168] classic ECN receiver.
3.1. Codepoint Coding
The more accurate ECN feedback coding uses the ECE, CWR and NS bits
as one field to encode 8 distinct codepoints. This overloaded use of
these 3 header flags as one 3-bit more Accurate ECN (AcE) field is
shown in Figure 2. The actual definition of the TCP header,
including the addition of support for the ECN Nonce, is shown for
comparison in Figure 1. This specification does not redefine the
names of these three TCP flags, it merely overloads them with another
definition once a flow with more accurate ECN feedback is
established.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | | U | A | P | R | S | F |
| Header Length | Reserved | AcE | R | C | S | S | Y | I |
| | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 2: Definition of the AcE field within bytes 13 and 14 of the
TCP Header (when SYN=0).
The 8 possible codepoints are shown below. Five of them are used to
encode a "congestion indication" (CI) counter. The other three
codepoints are undefined but can be used for some kind of integrity
check (see appendix Appendix B). The CI counter maintains the number
of CE marks observed at the receiver (see Section 3.3.1).
Also note that, whenever the SYN flag of a TCP segment is set
(including when the ACK flag is also set), the NS, CWR and ECE flags
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 9]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
(i.e. the AcE field of the <SYN,ACK>) MUST NOT be interpreted as the
3-bit codepoint, which is only used in non-SYN packets.
+-----+----+-----+-----+------------+
| AcE | NS | CWR | ECE | CI (base5) |
+-----+----+-----+-----+------------+
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 1 |
| 2 | 0 | 1 | 0 | 2 |
| 3 | 0 | 1 | 1 | 3 |
| 4 | 1 | 0 | 0 | 4 |
| 5 | 1 | 0 | 1 | - |
| 6 | 1 | 1 | 0 | - |
| 7 | 1 | 1 | 1 | - |
+-----+----+-----+-----+------------+
Table 2: Codepoint assignment for accurate ECN feedback
By default an accurate ECN receiver MUST echo one of the codepoints
encoding the CI counter value. Whenever a CE is received and thus
the value of the CI has changed, the receiver MUST echo the CI in the
next ACK. Moreover, the receiver MUST repeat the codepoint, that
provides the CI counter, directly on the subsequent ACK. Thus every
value of CI will be transmitted at least twice. Otherwise the
receiver MAY send one of the other, currently undefined, codepoints.
This requirement may conflict with delayed ACK ratios larger than
two, using the available number of codepoints. A receiver MUST
change the ACK'ing rate such that a sufficient rate of feedback
signals can be sent. Details on how the change in the ACK'ing rate
can be implemented are given in the section Section 3.3.
3.2. More Accurate ECN TCP Sender
This section specifies the sender-side action describing how to
exclude the number of congestion markings from the given receiver
feedback signal.
When the more accurate ECN feedback scheme is supported by the
sender, the sender will maintain a congestion indication received
(CI.r) counter. This CI.r counter will hold the number of CE marks
as signaled by the receiver, and reconstructed by the sender.
On the arrival of every ACK, the sender calculates the difference D
between the local CI.r value modulo 5, and the signaled CI value of
the codepoint in the ACK. The value of CI.r is increased by D, and D
is assumed to be the number of CE marked packets that arrived at the
receiver since it sent the previously received ACK.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 10]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
3.3. More Accurate ECN TCP Receiver
This section describes the receiver-side action to signal the
accurate ECN feedback back to the sender. The receiver will need to
maintain a congestion indication (CI) counter of how many CE marking
have been seen during a connection. Thus for each incoming segment
with a CE marking, the receiver will increase CI by 1. With each ACK
the receiver will calculate CI modulo 5 and set the respective
codepoint in the AcE field (see table Table 2). To avoid counter
wrap-arounds in a high congestion situation, the receiver SHOULD
switch from a delayed ACK behavior to send ACKs immediately after the
data packet reception if needed.
3.3.1. Implementation
The receiver counts how many packets carry a congestion notification.
This could, in principle, be achieved by directly increasing the CI
for every incoming CE marked segment. Since the space for
communicating the information back to the sender in ACKs is limited,
instead of directly increasing this counter, a "gauge" (CI.g) is
increased instead.
When sending an ACK, the CI is increased by either CI.g or at maximum
by 4 as a larger increase could cause an overflow in the codepoint
counter signaling. Thereafter, CI.g is reduced by the same amount.
Then the current CI value (modulo 5) is encoded in the current ACK.
To avoid losing information, it must be ensured that an ACK is sent
at least after 5 incoming, outstanding congestion marks (i.e. when
CI.g exceeds 5). Architecturally the counters never decrease during
a TCP session. However, any overflow MUST be modulo a multiple of 5
for CI.
For resilience against lost ACKs, an indicator flag (CI.i) SHOULD be
used to ensure that, whether another congestion indication arrives or
not, a second ACK transmits the previous counter value again. Thus
when a codepoint is transmitted the first time, CI.i will be set to
one. Then with the next ACK the same codepoint is transmitted again
and the CI.i is reset to zero. Only when CI.i is zero, the counter
CI can be increased. In case of heavy congestion (basically all
segments are CE marked) the CI.g might grow continuously. In this
case the ACK rate should be increased by sending an immediate ACK for
an incoming data segment.
The following table provides an example showing an half-connection
with a TCP sender A and a TCP receiver B. The sender maintains a
counter CI.r to reconstruct the number of CE mark seen at the
receiver-side.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 11]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
+----+------+---------------+------------+---------------+------+
| | Data | TCP A | IP | TCP B | Data |
+----+------+---------------+------------+---------------+------+
| | | SEQ ACK CTL | | SEQ ACK CTL | |
| -- | | ------------- | ---------- | ------------- | |
| 1 | | 0100 SYN | ----> | | |
| | | CWR,ECE,NS | | | |
| 2 | | | <---- | 0300 0101 SYN | |
| | | | | ACK,CWR | |
| 3 | | 0101 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=0 CI.g=1 | |
| 4 | 100 | 0101 0301 ACK | ECT0 ----> | | |
| | | | | CI.c=1 CI.g=0 | |
| 5 | | | <---- | 0301 0201 ACK | |
| | | | | ECI=CI.1 | |
| | | CI.r=1 | | | |
| 6 | 100 | 0201 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=1 CI.g=1 | |
| 7 | 100 | 0301 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=1 CI.g=2 | |
| 8 | | | XX-- | 0301 0401 ACK | |
| | | | | ECI=CI.1 | |
| | | CI.r=1 | | | |
| 9 | 100 | 0401 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=1 CI.g=3 | |
| 10 | 100 | 0501 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=5 CI.g=0 | |
| 11 | | | <---- | 0301 0601 ACK | |
| | | | | ECI=CI.0 | |
| | | CI.r=5 | | | |
| 12 | 100 | 0601 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=5 CI.g=1 | |
| 13 | 100 | 0701 0301 ACK | ECT0 -CE-> | | |
| | | | | CI.c=5 CI.g=2 | |
| 14 | | | <---- | 0301 0801 ACK | |
| | | | | ECI=CI.0 | |
| | | CI.r=5 | | | |
+----+------+---------------+------------+---------------+------+
Table 3: Codepoint signal example
3.4. Advanced Compatibility Mode
TBD (more detailed description see
draft-ietf-conex-tcp-modifications)
This section describes a possible mechanism to achieve more accurate
ECN feedback even when the receiver is not capable of the new more
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 12]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
accurate ECN feedback scheme with the drawback of less reliability.
During initial deployment, a large number of receivers will only
support [RFC3168] classic ECN feedback. Such a receiver will set the
ECE bit whenever it receives a segment with the CE codepoint set, and
clear the ECE bit only when it receives a segment with the CWR bit
set. As the CE codepoint has priority over the CWR bit (Note: the
wording in this regard is ambiguous in [RFC3168], but the reference
implementation of ECN in ns2 is clear), a [RFC3168] compliant
receiver will not clear the ECE bit on the reception of a segment,
where both CE and CWR are set simultaneously. This property allows
the use of a compatibility mode, to extract more accurate feedback
from legacy [RFC3168] receivers by setting the CWR permanently.
Assuming a delayed ACK ratio of one (no delayed ACKs), a sender can
permanently set the CWR bit in the TCP header, to receive a more
accurate feedback of the CE codepoints as seen at the receiver. This
feedback signal is however very brittle and any ACK loss may cause
congestion information to become lost. Delayed ACKs and ACK loss can
both not be accounted for in a reliable way, however. Therefore, a
sender would need to use heuristics to determine the current delay
ACK ratio M used by the receiver (e.g. most receivers will use M=2),
and also the recent ACK loss ratio. Acknowledge Congestion Control
(AckCC) as defined in [RFC5690] can not be used, as deployment of
this feature is only experimental.
Using a phase locked loop algorithm, the CWR bit can then be set only
on those data segments, that will trigger a (delayed) ACK. Thereby,
no congestion information is lost, as long as the ACK carrying the
ECE bit is seen by the sender.
Whenever the sender sees an ACK with ECE set, this indicates that at
least one, and at most M data segments with the CE codepoint set
where seen by the receiver. The sender SHOULD react, as if M CE
indications where reflected back to the sender by the receiver,
unless additional heuristics (e.g. dead time correction) can
determine a more accurate value of the "true" number of received CE
marks.
4. Acknowledgements
We want to thank Bob Briscoe and Michael Welzl for their input and
discussion. Special thanks to Bob Briscoe, who first proposed the
use of the ECN bits as one field and the handshake negotiation for
more accurate ECN.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 13]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
5. IANA Considerations
This memo includes no request to IANA.
6. Security Considerations
TBD
ACK loss
This scheme sends each codepoint (of the two subsets) at least two
times. In the worst case at least one, and often two or more
consecutive ACKs can be dropped without losing congestion
information. Further refinements, such as interleaving ACKs when
sending codepoints belonging to the two subsets (e.g. CI, E1), can
allow the loss of any two consecutive ACKs, without the sender losing
congestion information, at the cost of also reducing the ACK ratio.
At low congestion rates, the sending of the current value of the CI
counter by default allows higher numbers of consecutive ACKs to be
lost, without impacting the accuracy of the ECN signal.
ECN Nonce
In the proposed scheme there are three more codepoints available that
could be used for an integrity check like ECN Nonce. If ECN nonce
would be implemented as proposed in Appendix B, even more information
would be provided for ECN Nonce than in the original specification.
A delayed ACK ratio of two can be sustained indefinitely even during
heavy congestion, but not during excessive ECT(1) marking, which is
under the control of the sender. A higher ACK ratio can be sustained
when congestion is low, but a low ACK ratio my be needed for the E1
feedback.
7. References
7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 14]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, June 2003.
7.2. Informative References
[Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP:
Efficient Packet Transport for the Commoditized Data
Center", Jan 2010.
[I-D.briscoe-tsvwg-re-ecn-tcp]
Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
"Re-ECN: Adding Accountability for Causing Congestion to
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in
progress), October 2010.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
June 2009.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009.
[RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
Acknowledgement Congestion Control to TCP", RFC 5690,
February 2010.
[draft-kuehlewind-tcpm-accurate-ecn-option]
Kuehlewind, M. and R. Scheffenegger, "Accurate ECN
Feedback Option in TCP",
draft-kuehlewind-tcpm-accurate-ecn-option-01 (work in
progress), Jul 2012.
Appendix A. Estimating CE-marked bytes
TBD (see draft-ietf-conex-tcp-modifications-02 and 'late ACK' scheme
of 1 Bit scheme in draft-kuehlewind-tcpm-accurate-ecn-00)
Appendix B. Use with ECN Nonce
In ECN Nonce, by comparing the number of incoming ECT(1)
notifications with the actual number of packets that were transmitted
with an ECT(1) mark as well as the sum of the sender's two internal
counters, the sender can probabilistically detect a receiver that
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 15]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
sends false marks or supresses accurate ECN feedback, or a path that
does not properly support ECN.
+-----+----+-----+-----+------------+------------+
| ECI | NS | CWR | ECE | CI (base5) | E1 (base3) |
+-----+----+-----+-----+------------+------------+
| 0 | 0 | 0 | 0 | 0 | - |
| 1 | 0 | 0 | 1 | 1 | - |
| 2 | 0 | 1 | 0 | 2 | - |
| 3 | 0 | 1 | 1 | 3 | - |
| 4 | 1 | 0 | 0 | 4 | - |
| 5 | 1 | 0 | 1 | - | 0 |
| 6 | 1 | 1 | 0 | - | 1 |
| 7 | 1 | 1 | 1 | - | 2 |
+-----+----+-----+-----+------------+------------+
Table 4: Codepoint assignment for accurate ECN feedback and ECN Nonce
If an ECT(1) mark is received, an ETC(1) counter (E1) is incremented.
The receiver has to convey that updated information to the sender
with the next possible ACK using the three remaining codepoints as
show in table Table 4. Thus on the reception of a ECT(1) marked
packet, the receiver should signal the current value of the E1
counter (modulo 3) in the next ACK. If a CE mark was received before
sending the next ACK (e.g. delayed ACKs) sending that update MUST
take precedence. The receiver should also repeat sending every E1
value. But this repetition does not need to be in the consecutive
ACK as the E1 value will only be transmitted when no changes in the
CI have occurred. Each E1 value will therefore be sent exactly
twice. The repetition of every signal will provide further
resilience against lost ACKs.
As only a limited number of E1 codepoints exist and the receiver
might not acknowledge every single data packet immediately (delayed
ACKs), a sender SHOULD NOT mark more than 1/m of the packets with
ECT(1), where m is the ACK ratio (e.g. 50% when every second data
packet triggers an ACK). This constraint will avoid a permanent
feedback of E1 only, and must be maintained also on short timescales.
A sender SHOULD send no more than 3 consecutive packets marked with
ECT(1).
The same counter / gauge method as described in Section 3.3.1 can be
used to count and return (using a different mapping) the number of
incoming packets marked ECT(1) (called E1 in the algorithm). As few
codepoints are available for conveying the E1 counter value, an
immediate ACK MUST be triggered whenever the gauge E1.g exceeds a
threshold of 3. The sender receives the receiver's counter values
and compares them with the locally maintained counter.
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 16]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
B.1. Pseudo Code for the Codepoint Coding
IP signals: CE
TCP Fields: AcE
Counters:
CI Congestion Indication - counter [0..(n*5-1)]
CI.g Congestion Indication - Gauge [0.."inf"])
CI.i Congestion Indication - indicator flag [0,1]
At session initialization, all these counters are initialized to zero.
When a segment (Data, ACK) is received, perform the following steps:
If (CE) # When a CE codepoint is received,
CI.g++ # Increase CI.g by 1
If (ECT(1)) # When a ECT(1) codepoint is received,
E1.g++ # Increase E1.g by 1
If (CI.g > 5) or # When ACK rate is not sufficient to keep
(E1.g > 3) # gauges close to zero,
Send ACK immediately # increase ACK rate
When preparing an ACK to be sent:
If (CI.g > 0) or # When there is a unsent change in CI
( (E1.i != 0) and # this check is to in effect alternate
(CI.i != 0) ) # sending CI and E1 codepoints
If (CI.i == 0) and # updates to CI allowed
(CI.g > 0) # update is meaningful
CI.i = 1 # set flag to repeat CI value
CI += min(4,CI.g) # 4 for 5 codepoints
CI %= 5 # using modulo the available codepoints
CI.g -= min(4,CI.g) # reduce the holding gauge accordingly
Else
CI.i-- # just in case CI.f was set to
# more than 1 for resiliency
Send ACK with AcE set to CI
Else
If (E1.g > 0) or
(E1.i != 0)
If (E1.i == 0) and
(E1.g > 0)
E1.i = 1
E1 += min(2, E1.g)
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 17]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
E1 %= 3
E1.g -= min(2, E1.g)
Else
E1.i--
Send ACK with AcE set to E1
Else
Send ACK with AcE set to CI # default action
Sender:
Counters:
CI.r - current value of CEs seen by receiver
E1.s - sum of all sent ECT(1) marked packets (up to snd.nxt)
E1.s(t) - value of E1.s at time (in sequence space) t
E1.r - value signaled by receiver about received ECT(1) segments
E1.r(t) - value of E1.r at time (in sequence space) t
CI.r(t) - ditto
# Note: With a codepoint implementation,
# a reverse table ECI[n] -> CI.r / E1.r is needed.
# The wire protocol transports the absolute value
# of the receiver-side counter.
# Thus the (positive only) delta needs to be calculated,
# and added to the sender-side counter.
If ACK AcE in the set of CI values
D = (AcE.CI + 5 - (CI.r mod 5)) mod 5
CI.r += D
If ACK AcE in the set of E1 values
D = (Ace.E1 + 3 - (E1.r mod 3)) mod 3
E1.r += D
# Before CI.r or E1.r reach a (binary) rollover,
# they need to roll over some multiple of 5
# and 3 respectively.
CI.r = CI.r modulo 255 # 5 * 51
E1.r = E1.r modulo 255 # 3 * 85
# (an implementation may choose to use another constant,
# ie 3^4*5^4 (50625) for 16-bit integers,
# or 3^8*5^8 (2562890625) for 32-bit integers)
# The following test can (probabilistically) reveal,
# if the receiver or path is not properly
# handling ECN (CE, E1) marks
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 18]
Internet-Draft More Accurate ECN Feedback in TCP July 2012
If not E1.r(t) <= E1.s(t) <= E1.r(t) + CI.r(t)
# -> receiver or path do not properly reflect ECN
# (or too many ACKs got lost, which can be checked
# also by the sender).
Authors' Addresses
Mirja Kuehlewind (editor)
University of Stuttgart
Pfaffenwaldring 47
Stuttgart 70569
Germany
Email: mirja.kuehlewind@ikr.uni-stuttgart.de
Richard Scheffenegger
NetApp, Inc.
Am Euro Platz 2
Vienna, 1120
Austria
Phone: +43 1 3676811 3146
Email: rs@netapp.com
Kuehlewind & Scheffenegger Expires January 17, 2013 [Page 19]