Network Working Group M. Bagnulo
Internet-Draft UC3M
Intended status: Informational B. Briscoe
Expires: April 2, 2017 Simula Research Lab
September 29, 2016
Adding Explicit Congestion Notification (ECN) to TCP control packets and
TCP retransmissions
draft-bagnulo-tcpm-generalized-ecn-00
Abstract
This document describes an experimental modification to ECN to allow
the use of ECN to the following TCP packets: SYNs, Pure ACKs, Window
probes, FINs, RSTs and retransmissions.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 2, 2017.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Bagnulo & Briscoe Expires April 2, 2017 [Page 1]
Internet-Draft ECN and TCP control packets September 2016
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Experiment goals . . . . . . . . . . . . . . . . . . . . 3
1.3. Document structure . . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Specification . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Network behaviour . . . . . . . . . . . . . . . . . . . . 5
3.2. Endpoint behaviour . . . . . . . . . . . . . . . . . . . 6
3.2.1. SYN . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.2. Pure ACK . . . . . . . . . . . . . . . . . . . . . . 8
3.2.3. Window Probe . . . . . . . . . . . . . . . . . . . . 9
3.2.4. FIN . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.5. RST . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.6. Retransmissions . . . . . . . . . . . . . . . . . . . 11
4. Discussion about the arguments in RFC3168 . . . . . . . . . . 12
4.1. The reliability argument . . . . . . . . . . . . . . . . 12
4.2. TCP SYNs . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3. Pure ACKs. . . . . . . . . . . . . . . . . . . . . . . . 16
4.4. Retransmitted packets. . . . . . . . . . . . . . . . . . 18
4.5. Window probe packets . . . . . . . . . . . . . . . . . . 20
5. Security considerations . . . . . . . . . . . . . . . . . . . 21
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21
8. Informative References . . . . . . . . . . . . . . . . . . . 21
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction
RFC3168 [RFC3168] specifies the support of Explicit Congestion
Notification (ECN) to IP. By using the ECN capability, switches
performing Active Queue Management (AQM) can use ECN marks instead of
packets drops to signal congestion to the endpoints of a
communication. This results in lower packet loss and increased
performance. However, RFC3168 specifies the support of ECN in TCP
data packets, but precludes the use of ECN in TCP control packets
(TCP SYN, TCP SYN/ACK, pure ACKs, Window probes) and in retransmitted
packets. RFC3168 is silent about the use of ECN in RST and FIN
packets. RFC 5562 [RFC5562] is an experimental extension to ECN that
enables the ECN support for TCP SYN/ACK packets. This document
defines an experimental modification to ECN that enables the ECN
support in all the aforementioned packet types.
Bagnulo & Briscoe Expires April 2, 2017 [Page 2]
Internet-Draft ECN and TCP control packets September 2016
1.1. Motivation
The inability of using ECN in TCP control packets and retransmissions
has a potential harmful effect, especially in environments where ECN
support is pervasive. For example, [judd-nsdi] shows that in a data
center (DC) environment where DCTCP is used (in conjunction with
ECN), the the probability of being able to establish a new connection
using a non-ECT-marked SYN packet drops to close to 0 when there are
16 ongoing TCP flows transmitting at full speed. In this particular
context of a datacenter using DCTCP, the issue is that the proposed
AQM aggressively marks packets to keep the buffer queues small and
this implies that non-ECT-marked packets are in turn dropped
aggressively as well, rendering nearly impossible to establish new
connection when there is ongoing traffic.
These limitations are not limited to the data center environment. In
any ECN deployment, non ECT marked packets suffer a penalty when they
traverse a congested bottleneck. For instance, with a drop
probability of 1%, 1% of connection attempts suffer a timeout before
the SYN is retransmitted, which is very detrimental to the
performance of short flows. Dropping TCP control traffic, such as
TCP SYNs and pure ACKs have a negative effect on the overall
performance of the communication, so it is beneficial to avoid it.
Finally, there are ongoing efforts to promote the adoption of DCTCP
(and similar transports) over the Internet to achieve low latency for
all communications [I-D.briscoe-tsvwg-aqm-tcpm-rmcat-l4s-problem].
In such approach, ECN capable packets are treated more favorably, as
they are likely to experience less delay and lower packet drop
probability. Preventing TCP control packets, which are critical for
TCP performance, to obtain the benefits of ECN would result in
degraded performance.
1.2. Experiment goals
The goal of the experimental extensions defined in this document is
to allow the use of ECN (both ECT and CE codepoints) in the public
Internet as well as in controlled environments so we can find out
about the following issues:
How SYN, Window probes, pure ACKs, FINs, RSTs and retransmissions
that carry the ECT(0), ECT(1) or CE codepoints are processed by
the TCP endpoints and the network (including routers, firewalls
and other middleboxes). In particular we would like to learn if
these packets are frequently blocked or if these packets are
usually forwarded and processed. This will affect the design of
the support of the different packet types considered.
Bagnulo & Briscoe Expires April 2, 2017 [Page 3]
Internet-Draft ECN and TCP control packets September 2016
The scale of deployment of the different flavors of ECN, including
[RFC3168], [RFC5562], [RFC3540] and [I-D.ietf-tcpm-accurate-ecn].
Depending of how pervasive is the deployment of each option, the
design of adding ECN support to the different packet types
considered in this document can vary greatly.
How much the performance of the TCP communications is improved by
allowing the ECN marking of these packets, for each of the
different packet types.
Identify any issues (including security issues) that enabling the
ECN marking of these packets may imply.
The data gathered through the experiments described in this document
will help in the design of the final mechanism (if any) to add ECN
support to the different packet types considered in this document.
Whenever data input is needed to assist in a design choice, it is
spelled out throughout the document.
Success criteria: If we manage to obtain enough data to have a
clearer view of the deployability, the benefits and any other issues
of ECN marking of the considered packets, the experiment will be a
success. If the results of the experiment show that it is feasible
to deploy such changes, there are gains to be achieved though the
changes described in this specification and no other major issues
that may interfere with the deployment of the proposed changes, then
it would be reasonable to attempt to update RFC3168 to adopt the
proposed changes in a standards track specification.
1.3. Document structure
The remaining of this document is structured as follows. In section
Section 2, we present the terminology used in the rest of the
document. In section Section 3, we specify the modifications to
provide ECN support to TCP SYNs, pure ACKs, Window probes, FINs, RSTs
and retransmissions. We describe both the network behaviour and the
endpoint behaviour (this last one detailed for both the TCP sender
and TCP receiver). RFC3168 does not prevents from using ECN in TCP
control packets lightly. It provides a number of specific reasons
for each packet type. In this Section 4, we revisit each of the
arguments provided by RFC3168 and explore possibilities to enable the
ECN capability in the different packet types.
2. Terminology
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119].
Bagnulo & Briscoe Expires April 2, 2017 [Page 4]
Internet-Draft ECN and TCP control packets September 2016
Pure ACK: A TCP segment with the ACK flag set and no data payload.
SYN: A TCP segment with the SYN flag set. It may carry data if TCP
Fast Open is used.
Window probe: Defined in [RFC1122], Window probe is a TCP segment
with only one byte of data sent to learn is the the receiver window
is still zero or else.
FIN: A TCP segment with the FIN flag set.
RST: A TCP segment with the RST flag set.
Retransmission: A TCP segment that has been retransmitted by the TCP
sender because it determined that the original segment was lost,
which may or may not be the case.
3. Specification
3.1. Network behaviour
If a router or any other middlebox along the path, receives a SYN, or
a Window probe or a Pure ACK or FIN or a RST or a Retransmission with
the ECT(0) or the ECT(1) codepoint set, then:
if the router is not congested (i.e. if in the same situation and
if the received packet was the same packet but with the non-ECT
codepoint, then the router would forward such packet), the router
SHOULD forward the packet.
It could be possible to use a MUST instead of a SHOULD. The
reason to use a SHOULD is that it may be the case that some
firewalls or other middleboxes may decide to block some of
these packets, such as ECT marked SYNs if they detect an
ongoing attack, which would then qualify the SHOULD and would
allow these boxes to drop the packets. This issue is to be
discussed.
if the router is congested, (i.e. if in the same situation and if
the received packet was the same packet but with the non-ECT
codepoint, then the router would drop the packet), then the router
MAY set the CE codepoint in the packet instead of dropping the
packet. If other behaviour alternative to drop is defined in
another specification (e.g. for ECT(1)) then this text should be
updated to allow this alternative behaviour with a proper link to
that specification.
Bagnulo & Briscoe Expires April 2, 2017 [Page 5]
Internet-Draft ECN and TCP control packets September 2016
3.2. Endpoint behaviour
3.2.1. SYN
A first design choice that needs to be taken when proposing the
experimental extensions to support ECN marking in SYN packets is
whether the ECN marking of the SYN packets is done only for AccECN
endpoints [I-D.ietf-tcpm-accurate-ecn] or also RFC3168 endpoints.
As far as AccECN capable endpoints, [I-D.ietf-tcpm-accurate-ecn]
defines the wire protocol for supporting ECN marking of the SYN as
well as feeding back the congestion signal to the sender if a SYN
packet with the CE codepoint was delivered to the receiver. In the
mechanism described next follows the wire format defined in
[I-D.ietf-tcpm-accurate-ecn] and also complements it with the ECT
marking of the SYN as well as the endpoint behaviour both of which
are left undefined in [I-D.ietf-tcpm-accurate-ecn].
The mechanism described below does not support ECN marking of SYNs
with RFC3168 endpoints. Whether this is needed is left for
discussion in the WG.
3.2.1.1. TCP client behaviour
In this section we specify the behaviour of a TCP client that wishes
to support ECN marking in the SYN. The proposed behaviour is fully
compliant with [I-D.ietf-tcpm-accurate-ecn]. We call W0 the default
value for the Initial Window supported by the TCP endpoint.
According to current specifications, W0 can be any value between 2
and 10. A TCP endpoint supporting this specification SHOULD have a
cache where it stores information about the type of ECN support
(RFC3168 ECN, AccECN, ECN in the SYN, no-ECN).
If a TCP client wishes to use ECN in a given connection and it also
wishes to enable the ECN marking of the TCP SYN packet, then the TCP
client MUST send the TCP SYN with the ECT(0) or ECT(1) codepoints in
the IP header and the NS, the CWR and the ECE flags set to 1 in the
TCP header. The server SHOULD not set any of the ECT codepoints if
the server is included in the cache as not supporting ECN in the SYN
packet. Shall we define the lifetime of the entries and how are they
extended? The client SHOULD use an initial retransmission timeout
value of 500ms or less for this connection.
DISCUSSION: Should (may?) the sender send also a SYN with the non-
ECT codepoint, perhaps slightly delayed after this first one?
This would reduce the penalty of adoption the ECN marking of SYNs
when communicating with TCP receivers that silently drop ECT SYNs.
In the text below, we first consider the case where the client
Bagnulo & Briscoe Expires April 2, 2017 [Page 6]
Internet-Draft ECN and TCP control packets September 2016
only sends the ECT marked SYN and then we consider the case where
the client sends first an ECT marked SYN followed by a regular
SYN.
After sending the ECT marked SYN, the client may receive the
following different replies and will react as follows:
If the client receives a SYN/ACK with the CWR and the ECE flags set
to 1 and the NS flag set to 0, (this means that the server supports
AccECN and that the SYN was not marked with the CE codepoint while
being forwarded though the network) then the client must continue
with the connection establishment using an Initial Window of W0 and
it must use AccECN for this connection (as defined in
[I-D.ietf-tcpm-accurate-ecn]). The client SHOULD cache that this
server supports ECN in the SYN packet and AccECN.
If the client receives a SYN/ACK with the CWR, the ECE and the NS
flags set to 1, (this means that the server supports AccECN and that
the SYN was marked with the CE codepoint while being forwarded though
the network) then the client must continue with the connection
establishment using an Initial Window of 1 SMSS and it must use
AccECN for this connection (as defined in
[I-D.ietf-tcpm-accurate-ecn]).The client SHOULD cache that this
server supports ECN in the SYN packet and AccECN.
If the client receives a SYN/ACK with the ECE flag set to 1 and the
CWR flag set to 0 (the value of the NS flag can be either 1 or 0),
(this means that the server supports RFC3168 ECN but does not support
AccECN nor this specification) then the client must continue with the
connection establishment and it must use RFC3168 ECN for this
connection The client SHOULD cache that this server does not support
ECN in the SYN packet nor AccECN but it supports RFC3168 ECN.
DISCUSSION: Initial Window. Because the server does not support
ECT marking of the SYN message, it is not possible for the client
to know if the SYN was marked with the CE codepoint while
transiting to the server. We can either assume that the SYN was
marked with CE and impose an initial window of 1 MSS, or we can
assume it was not marked and use an initial window of W0 or
something in between (e.g. W0/2). If we impose a smaller initial
window, we are penalizing those that implement this specification,
which is in itself an optimization. Also, caching will help in
this case, as after the first contact a client will use the
appropriate mode to contact a server.
If the client receives a SYN/ACK with the ECE, the CWR and the NS
flags set to 0, (this means that the server does not support ECN)
then the client must continue with the connection establishment and
Bagnulo & Briscoe Expires April 2, 2017 [Page 7]
Internet-Draft ECN and TCP control packets September 2016
it must not use ECN for this connection The client SHOULD cache that
this server does not support ECN.
DISCUSSION: Same discussion about the Initial Window as in the
previous case.
If the client receives a SYN/ACK with the ECE, the CWR and the NS
flags set to 1, (this means that the server does not support ECN and
it is not compliant with current standards) then the client may
continue with the connection establishment and it must not use ECN
for this connection The client SHOULD cache that this server does not
support ECN. See Appendix B of [I-D.ietf-tcpm-accurate-ecn] for
ideas about how these tow last cases can be used.
DISCUSSION: Same discussion about the Initial Window as in the
previous case.
If the client receives a SYN, then the client must behave as defined
in [I-D.ietf-tcpm-accurate-ecn], which implies sending some form of
SYN/ACK. After doing so, the client may receive some form os SYN/ACK
back. the processing of the SYN/ACK will follow the rules specified
above. This is the case of simultaneous open.
If the retransmission timer times out, then the client SHOULD send a
TCP SYN with the No-ECT codepoint in the IP header. The initial
retransmission timeout MUST be set to 1 sec.
3.2.1.2. TCP server
The TCP server behaviour is defined in [I-D.ietf-tcpm-accurate-ecn].
In addition, the TCP server SHOULD cache the type of endpoint of the
client, for future connections.
3.2.2. Pure ACK
3.2.2.1. TCP sender behaviour
A TCP endpoint MAY set the ECT(0) or the ECT(1) codepoints in the IP
header of packets carrying a TCP pure ACK.
In the case that a TCP endpoint is only sending pure ACKs and it
decides to mark them as ECT, the endpoint may receive a congestion
signal back indicating that one or more of the pure ACKs it has sent
have experienced congestion. when this happens, the endpoint will
react in the same way than it would if any other packet has
experienced congestion i.e. it will reduce its congestion window
accordingly. However, if the endpoint is only sending pure ACKs this
will have no effect in the load offered to the network and hence it
Bagnulo & Briscoe Expires April 2, 2017 [Page 8]
Internet-Draft ECN and TCP control packets September 2016
will not help in reducing the congestion. It may be possible to
explore some ways to help reducing congestion in this scenario. For
instance, one possibility would be for the endpoint to increase the
maximum number of data packets that the endpoint can wait until
sending a delayed ACK. Current specifications (RFC 1122 and RFC
5681) mandate that at most one ACK must be sent every two full-sized
segments. Upon congestion notification, the endpoint could increase
the number of segments required to send an ACK (while preserving the
timeout value unchanged). This would reduce the number of ACKs sent
by the endpoint and hence the offered load. It is up for discussion
in the WG if it is worth it. Also, it should be noted than if an ACK
is dropped due to congestion the sender of the ACK does not react by
reducing the load in any way.
3.2.2.2. TCP receiver behaviour
Upon reception of a pure ACK with the ECT(0), ECT(1) or CE
codepoints, the TCP receiver will process it as if it were any other
legitimate packet (e.g. a data packet). The exact treatment depends
on the flavour of ECN that the endpoint implements (either RFC3168 or
AccECN). In particular for AccECN, a CE marked pure ACK would
increase the CE packet counter and would not increase the CE byte
counter.
3.2.3. Window Probe
3.2.3.1. TCP sender behaviour
A TCP endpoint MAY set the ECT(0) or the ECT(1) codepoints in the IP
header of a packet carrying a zero window probe (ZWP) packet.
According to RFC793, a TCP endpoint that has an ongoing connection
for which the other endpoint has announced a receiver window equal to
zero will send periodic ZWP every two minutes until a non zero window
is announced by the other endpoint of the connection. In case a TCP
endpoint that has an ongoing connection for which the other endpoint
has announced a receiver window equal to zero and it receives
incoming packets that include a congestion notification signal, the
only option for the endpoint to reduce the offered load is to
increase the time between ZWP messages e.g. do an exponential back-
off or increment the retransmission timer in some other way.
However, given that the current retransmission timer is pretty long,
it is not clear if this is an effective decrease of the offered load
from a congestion perspective. Note that the endpoint receiving the
congestion notification will reduce its congestion window, so that
when the receiver window opens, it will transmit with such a
congestion window.
Bagnulo & Briscoe Expires April 2, 2017 [Page 9]
Internet-Draft ECN and TCP control packets September 2016
3.2.3.2. TCP receiver behaviour
Upon reception of a ZWP with the ECT(0), ECT(1) or CE codepoints, the
TCP receiver will process it as if it were any other legitimate
packet (e.g. a data packet). The exact treatment depends on the
flavour of ECN that the endpoint implements (either RFC3168 or
AccECN).
3.2.4. FIN
3.2.4.1. TCP sender behaviour
A TCP endpoint MAY set the ECT(0) or the ECT(1) codepoints in the IP
header of a packet carrying the FIN flag of the TCP header set.
After sending the FIN, the endpoint will not send any more data in
the connection. It may send one or more pure ACKs, so if the
endpoint that has set the ECT codepoint in the FIN receives feedback
from the other endpoint that the FIN was receives with the CE
codepoint, there is little it can do to reduce the load offered to
the network. It is pointless to reduce the congestion window as the
endpoint will not send any more data. It can try to reduce the
amount of pure ACKs it sends, by using a similar approach as the one
suggested in Section 3.2.2 about incrementing the number of ACKs
accumulated before sending a delayed ACK.
3.2.4.2. TCP receiver behaviour
Upon reception of a FIN with the ECT(0), ECT(1) or CE codepoints, the
TCP receiver will process it as if it were any other legitimate
packet (e.g. a data packet). The exact treatment depends on the
flavour of ECN that the endpoint implements (either RFC3168 or
AccECN).
3.2.5. RST
A RST message is hardly a useful vehicle to convey congestion
notification information. The reason for this is that the endpoint
generating the RST message does not have an open connection after
sending it (either because there was no such connection when the
packet that triggered the RST message was received or because the
packet that triggered the RST message also triggered the closure of
the connection). So, if a congestion notification signal is fed back
to the sender to the RST message, the sender will not be able to do
anything about it. Moreover, the the perspective of the receiver of
the RST message with the CE bit set, it can either accept the RST
message and close the connection, so there is no point in echoing the
congestion notification signal received or it can discard the RST
Bagnulo & Briscoe Expires April 2, 2017 [Page 10]
Internet-Draft ECN and TCP control packets September 2016
message (e.g. because the sequence number is out of window) so it
probably makes sense also to discard the CE signal as well. So, from
the receiver perspective, there is no reaction to the reception of a
CE marked RST message.
So, the only motivation for marking the RST message with the ECT
codepoint is to reduce the chances of the RST message getting
dropped. The question whether it is useful to provide more reliable
delivery of RST messages is also non trivial. RST messages are used
to both create and mitigate attacks. Spoofed RST messages are used
by attackers to terminate ongoing connections. Legitimate RST
messages allow endpoints to inform their peers to eliminate existing
state that correspond to non existing connections, liberating
resources e.g. in DoS attacks scenarios.
So, with all this, probably the recommendation should be that:
for senders, stacks MUST allow for administrators to configure
whether the RST messages are marked with the ECT(0) or ECT(1)
codepoints. We should define a default behaviour, not sure which
that one should be.
for receivers, ECT and CE codepoints are ignored.
3.2.6. Retransmissions
3.2.6.1. TCP sender behaviour
A TCP endpoint MAY set the ECT(0) or the ECT(1) codepoints in the IP
header of a packet carrying a retransmitted segment.
Upon reception of congestion notification that the retransmitted
packet was marked with CE, the sender will react as with it would do
if it received congestion notification feedback concerning any other
data packet.
3.2.6.2. TCP receiver behaviour
The receiver of a retransmitted packet marked with the ECT(0), ECT(1)
or CE codepoints, reacts as it would do with any other data packet.
In particular, the condition of ignoring ECN information for packets
outside the receiver window still hold. This means that for those
retransmitted packets that the original packet was properly received,
the ECN information will be ignored. There is no problem with that,
since allowing the ECN marking of retransmitted packets still
increases the reliability of their transmission.
Bagnulo & Briscoe Expires April 2, 2017 [Page 11]
Internet-Draft ECN and TCP control packets September 2016
4. Discussion about the arguments in RFC3168
This section goes through each of the arguments presented in RFC3168
to prevent the ECN marking of the different packet types and provides
counter-arguments for each of them.
4.1. The reliability argument
While for each type of packet RFC 3168 provides a set of specific
arguments for preventing their marking, RFC3168 presents the reliable
delivery of the congestion signal as an overarching argument that
needs to be consider when trying to enable the ECT marking of TCP
control packets. In particular, Section 5.2 of RFC3168 states:
To ensure the reliable delivery of the congestion indication of
the CE codepoint, an ECT codepoint MUST NOT be set in a packet
unless the loss of that packet in the network would be detected by
the end nodes and interpreted as an indication of congestion.
We believe this argument is overly conservative. The overall
principle that should determine the level of reliability required for
ECN capable packets should be the one of "do not harm". Reliable
delivery of the CE codepoint is indeed paramount but the level of
reliability required should be the one of the original congestion
signal (i.e. the detection of the loss of the original packet). In
other words, the situation without ECN is that when a packet is to be
transmitted through a congested link, the packet may be dropped and
that is the congestion signal sent to the endpoint. When ECN is
introduced, the reliability of the delivery of the congestion signal
should be no worse than without ECN. In particular, setting the CE
codepoint in the very same packet seem to fulfill this criteria,
since either the packet is delivered and the CE codepoint signal is
delivered to the endpoint, or the packet is dropped, so the original
congestion signal through the packet loss is delivered to the
endpoint. Requiring more than this implies that the ECN congestion
signal is delivered more reliably than the current situation, which
is not a bad thing per se, but, as we describe in this memo, it
results in performance penalties that should be reconsidered in the
view of current deployments.
In addition, the reliability of the delivery of the congestion signal
is used an argument for not setting the ECT codepoint in TCP control
packets, which effectively reduced the reliability of the
transmission of these TCP control packets. There is the then a
tradeoff between the reliability of the delivery of the congestion
signal and the reliability of the delivery of TCP control packets.
As currently specified, ECN adoption implies an increased reliability
of the ECN congestion signal and a decrease in the reliability in the
Bagnulo & Briscoe Expires April 2, 2017 [Page 12]
Internet-Draft ECN and TCP control packets September 2016
TCP control packets. We believe that it is possible and desirable to
restore the tradeoff existent in non ECN capable networks in terms of
reliability, where the congestion signal delivery is as reliable as
in a non ECN capable network and so it is the delivery of TCP control
packets.
4.2. TCP SYNs
We next describe he arguments given by current specifications for
precluding ECT on SYN packets.
RFC 5562 presents two arguments against ECT marking of SYN packets
(quoted verbatim):
There are several reasons why an ECN-Capable codepoint must not be
set in the IP header of the initiating TCP SYN packet. First,
when the TCP SYN packet is sent, there are no guarantees that the
other TCP endpoint (node B in Figure 2) is ECN-Capable, or that it
would be able to understand and react if the ECN CE codepoint was
set by a congested router.
Second, the ECN-Capable codepoint in TCP SYN packets could be
misused by malicious clients to "improve" the well-known TCP SYN
attack. By setting an ECN-Capable codepoint in TCP SYN packets, a
malicious host might be able to inject a large number of TCP SYN
packets through a potentially congested ECN-enabled router,
congesting it even further.
We next go through all the arguments stated above to enable ECT
marking of SYN packets.
Argument 1: Unknown ECN capability at the responder. The initiator
does not know what the responder will do if an ECT or CE SYN arrives.
In a controlled environment, this argument does not hold because the
administrator can make sure that servers support ECN and in
particular ECN-capable SYN packets. Examples of controlled
environments are single-tenant DCs, and possibly multi-tenant DCs if
we assume that each tenant mostly communicates with its own VMs.
However, in the public Internet context, it cannot be assumed that
all TCP responders support ECN, and much less that they support ECT
marked SYN packets. It is possible that the responder will check
that the SYN complies with RFC 3168, which says a host "MUST NOT" set
ECT on a SYN. RFC 3168 does not say what the responder should do if
an ECN-capable SYN arrives. Some implementation might ignore the SYN
(either silently or by returning a RST). Also some middleboxes (e.g.
Bagnulo & Briscoe Expires April 2, 2017 [Page 13]
Internet-Draft ECN and TCP control packets September 2016
firewalls) might take either of these actions on behalf of the
responder.
Silent losses lead to much longer delays than resets by the following
reasoning. The responder sends a reset immediately, then the
initiator falls back to retransmitting a non-ECT SYN (and possibly
falls back from negotiating ECN in the TCP flags as well). However,
after a silent discard, the initiator has to wait longer for a
timeout. Then it might immediately fall back to retransmitting a
non-ECT SYN, or it might retransmit an unchanged SYN first, in case
the loss was simply due to congestion.
Ironically, the benefit of making SYNs ECN-capable is to avoid the
delays when SYNs are lost due to congestion. Policy-based discard of
ECN-capable SYNs would merely replace congestion as a cause of these
delays. So for ECT SYNs to be worthwhile it seems that the
percentage loss due to policy would have to be less than that due to
congestion. However, unlike congestion loss, policy loss is
predictable, so the initiator can avoid it by caching those sites
that do not support ECN-capable SYNs.
According to a study using 2014 data [ecn-pam] from a limited range
of vantage points, out of the top 1M Alexa web sites, 4791 (0,82%)
IPv4 sites and 104 (0,61%) IPv6 sites failed to establish a
connection when they received a TCP SYN with any ECN codepoint set in
the IP header and the appropriate ECN flags in the TCP header. Of
these, about 41% failed to establish a connection due to the ECN
flags in the TCP header even with a Not-ECT ECN field in the IP
header (i.e. despite full compliance with RFC 3168).
One option, would be to first send an ECT SYN and then a non-ECT SYN
(possibly with a small delay between them) and only accept the non-
ECT connection if it returned first. Nonetheless, even a cache of a
dozen or so sites would avoid performance problems with roughly the
Alexa top thousand, so it is questionable whether the level of
failure of ECT on SYNs warrants always sending two SYNs, particularly
given failures at well-maintained sites could reduce if ECT SYNs are
standardized.
Argument 2: Loss of congestion notification in the SYN packet due to
lack of support from the responder. If an ECT SYN packet is marked
as CE by a congested router along the path but the responder cannot
feed back CE marks on SYN packets, the congestion information will be
lost.
Currently, neither the TCP nor the DCTCP protocol provides space in
the SYN/ACK to send feed back in response CE on the SYN. The problem
is that there are two mutually exclusive uses of ECE on the SYN/ACK:
Bagnulo & Briscoe Expires April 2, 2017 [Page 14]
Internet-Draft ECN and TCP control packets September 2016
i) the responder has to set ECE=0 to agree to use ECN as part of the
3-way hand-shake; ii) both TCP and DCTCP use ECE=1 to feed back CE.
The accurate ECN (AccECN) proposal [I-D.ietf-tcpm-accurate-ecn]
suggests a two-pronged solution to this problem. First AccECN
provides a way for the responder to feed back whether there was CE on
the SYN, and second AccECN introduces a different combination of TCP
header flags on the SYN/ACK so that the initiator knows whether or
not the responder supports AccECN. Then if the responder does
indicate that it supports AccECN the initiator can be sure that, if
there is no CE feedback on the SYNACK, then there really was no CE on
the SYN.
If the responder's SYN/ACK shows that it does not support AccECN, the
initiator can take a conservative approach and assume the SYN was
marked with CE and reduce its initial window. However, the initiator
knows that congestion is not pathological enough for a router to have
had to turn off ECN, because it knows that both the SYN and the SYN/
ACK have been delivered through the network. Therefore, even a
conservative initiator would not have to reduce its initial window as
much as it would in response to a timeout following no response to
its SYN.
Nonetheless, even a slight conservative reduction in initial window
might be a significant penalty, especially in the early days of
deployment, when little support for ECT SYN packets will be
available. This could be mitigated by caching previous experience of
which servers support AccECN.
Argument 3: DoS attacks. [RFC5562] says that ECT SYN packets could
be misused by malicious clients to augment "the well-known TCP SYN
attack". It goes on to say "a malicious host might be able to inject
a large number of TCP SYN packets through a potentially congested
ECN-enabled router, congesting it even further."
We assume this is a reference to the TCP SYN flood attack (see
https://en.wikipedia.org/wiki/SYN_flood), which is an attack against
a responder end point. We assume the idea of this attack is to use
ECT to get more packets through an ECN-enabled router in preference
to other non-ECN traffic so that they can go on to use the SYN
flooding attack to inflict more damage on the responder end point.
This argument could apply to flooding with any type of packet, but we
assume SYNs are singled out because their source address is easier to
spoof, whereas floods of other types of packets are easier to block .
Mandating Not-ECT in an RFC does not stop attackers using ECT for
flooding. Nonetheless, if a standard says SYNs are not meant to be
ECT it would make it legitimate for firewalls to discard them.
Bagnulo & Briscoe Expires April 2, 2017 [Page 15]
Internet-Draft ECN and TCP control packets September 2016
However this would negate the considerable benefit of ECT SYNs for
compliant transports and seems unnecessary because RFC 3168 already
provides the means to address this concern. In section 7 is says
that an AQM MUST turn off ECN support if under persistent overload,
and this advice is repeated in [RFC7567] (section 4.2.1). This makes
it hard for flooding packets to gain from ECT, but more experiments
are needed to see how much might be gained by an attacker flying
"just under the radar".
Alternative behaviour. The initiator can set ECT on a SYN as long as
it also negotiates for the use of AccECN [I-D.ietf-tcpm-accurate-ecn]
and as long as it conservatively reduces its initial window if the
SYN/ACK shows that the responder does not support AccECN. The
reduction in initial window need not be as great as that required in
response to a timeout, because the return of a SYN/ACK proves that
congestion is not severe. In controlled environments like data
centres, universal support for AccECN could be arranged.
Further experiments are needed to test how much malicious hosts can
use ECT to augment flooding attacks without triggering AQMs to turn
off ECN support (as mandated by RFC 3168 and RFC 7567). If it is
found that ECT can only slightly augment flooding attacks, the risk
of such attacks will need to be weighed against the performance
benefits of ECT SYNs.
4.3. Pure ACKs.
RFC3168 gives the following arguments for not allowing the ECT
marking of pure ACKs (ACKs not piggy-backed on data). In section 5.2
it reads:
To ensure the reliable delivery of the congestion indication of
the CE codepoint, an ECT codepoint MUST NOT be set in a packet
unless the loss of that packet in the network would be detected by
the end nodes and interpreted as an indication of congestion.
Transport protocols such as TCP do not necessarily detect all
packet drops, such as the drop of a "pure" ACK packet; for
example, TCP does not reduce the arrival rate of subsequent ACK
packets in response to an earlier dropped ACK packet. Any
proposal for extending ECN- Capability to such packets would have
to address issues such as the case of an ACK packet that was
marked with the CE codepoint but was later dropped in the network.
We believe that this aspect is still the subject of research, so
this document specifies that at this time, "pure" ACK packets MUST
NOT indicate ECN-Capability.
Later on, in section 6.1.4 it reads:
Bagnulo & Briscoe Expires April 2, 2017 [Page 16]
Internet-Draft ECN and TCP control packets September 2016
For the current generation of TCP congestion control algorithms,
pure acknowledgement packets (e.g., packets that do not contain
any accompanying data) MUST be sent with the not-ECT codepoint.
Current TCP receivers have no mechanisms for reducing traffic on
the ACK-path in response to congestion notification. Mechanisms
for responding to congestion on the ACK-path are areas for current
and future research. (One simple possibility would be for the
sender to reduce its congestion window when it receives a pure ACK
packet with the CE codepoint set). For current TCP
implementations, a single dropped ACK generally has only a very
small effect on the TCP's sending rate.
We next address each of the arguments presented above.
The first argument is about lack of reliability while conveying
congestion notification information when carried in pure ACKs. This
is the specific instance for the pure ACK messages of the reliability
argument discussed in Section 4.1. In some cases, the loss of pure
ACKs is not detected by the endpoints, losing the congestion
notification information indadvertedly if it was to be carried in
those packets. As we argued before, the bar for deciding if a packet
can be marked with the ECT codepoint i.e. if it is suitable for
carrying congestion notification information is that the congestion
signal communication should be as reliable as dropping the packet.
After all, the alternative of setting the CE bit in the packet is
dropping the packet. So, the question is whether carrying congestion
information in a pure ACK conveys the congestion information as
reliably as when the pure ACK is dropped and it is obvious that the
answer to that question is clearly yes. If the pure ACK carrying the
ECT and the CE bits set is later dropped by the network, it will be
essentially falling back to the use of drop as congestion signal.
The second argument given in RFC3168 is the lack of means in a sender
of pure ACKs to reduce the load that is creating the congestion.
Again, marking pure ACKs with the ECT codepoint to allow them to
carry congestion marks would be no worse than not doing so (and it
would be detrimental from a performance perspective). The TCP
receiver does not ACK pure ACKs so the sender of the pure ACK will
receive no echo of any congestion notification. However, this is no
worse than if a pure ACK is dropped, which cannot even be detected by
the remote end.
The proposed AccECN modification to TCP feedback
[I-D.ietf-tcpm-accurate-ecn] involves a data receiver repeatedly
sending a count of received congestion marks. So AccECN could
include marks on pure ACKs in this count, even though it does not ACK
pure ACKs themselves. Nonetheless, if the original sender of the
pure ACK does not respond to this feedback, or if it is decided that
Bagnulo & Briscoe Expires April 2, 2017 [Page 17]
Internet-Draft ECN and TCP control packets September 2016
AccECN will not provide this information, it will still make sense to
set ECT on pure ACKs, because the congestion situation will be no
worse than it is today with non-ECT pure ACKs.
So, overall, we believe that in terms of conveying and reacting to
congestion, allowing ECT (and CE) to be set on pure ACKs is no worse
than not doing so (and dropping the pure ACK). ANd not setting ECT
on pure ACKs is certainly detrimental to performance because when a
pure ACK is lost it can prevent the release of new data.
4.4. Retransmitted packets.
RFC3168 does not allow setting the ECT codepoint in retransmitted
packets. The arguments presented in the specification for supporting
this design choice are the following ones (the text is quite long,
not sure if we should keep it all):
This document specifies ECN-capable TCP implementations MUST NOT
set either ECT codepoint (ECT(0) or ECT(1)) in the IP header for
retransmitted data packets, and that the TCP data receiver SHOULD
ignore the ECN field on arriving data packets that are outside of
the receiver's current window. This is for greater security
against denial-of-service attacks, as well as for robustness of
the ECN congestion indication with packets that are dropped later
in the network.
First, we note that if the TCP sender were to set an ECT codepoint
on a retransmitted packet, then if an unnecessarily-retransmitted
packet was later dropped in the network, the end nodes would never
receive the indication of congestion from the router setting the
CE codepoint. Thus, setting an ECT codepoint on retransmitted
data packets is not consistent with the robust delivery of the
congestion indication even for packets that are later dropped in
the network.
In addition, an attacker capable of spoofing the IP source address
of the TCP sender could send data packets with arbitrary sequence
numbers, with the CE codepoint set in the IP header. On receiving
this spoofed data packet, the TCP data receiver would determine
that the data does not lie in the current receive window, and
return a duplicate acknowledgement. We define an out-of-window
packet at the TCP data receiver as a data packet that lies outside
the receiver's current window. On receiving an out-of-window
packet, the TCP data receiver has to decide whether or not to
treat the CE codepoint in the packet header as a valid indication
of congestion, and therefore whether to return ECN-Echo
indications to the TCP data sender. If the TCP data receiver
ignored the CE codepoint in an out-of-window packet, then the TCP
Bagnulo & Briscoe Expires April 2, 2017 [Page 18]
Internet-Draft ECN and TCP control packets September 2016
data sender would not receive this possibly- legitimate indication
of congestion from the network, resulting in a violation of end-
to-end congestion control. On the other hand, if the TCP data
receiver honors the CE indication in the out-of-window packet, and
reports the indication of congestion to the TCP data sender, then
the malicious node that created the spoofed, out-of- window packet
has successfully "attacked" the TCP connection by forcing the data
sender to unnecessarily reduce (halve) its congestion window. To
prevent such a denial-of-service attack, we specify that a
legitimate TCP data sender MUST NOT set an ECT codepoint on
retransmitted data packets, and that the TCP data receiver SHOULD
ignore the CE codepoint on out-of-window packets.
One drawback of not setting ECT(0) or ECT(1) on retransmitted
packets is that it denies ECN protection for retransmitted
packets. However, for an ECN-capable TCP connection in a fully-
ECN-capable environment with mild congestion, packets should
rarely be dropped due to congestion in the first place, and so
instances of retransmitted packets should rarely arise. If
packets are being retransmitted, then there are already packet
losses (from corruption or from congestion) that ECN has been
unable to prevent.
We note that if the router sets the CE codepoint for an ECN-
capable data packet within a TCP connection, then the TCP
connection is guaranteed to receive that indication of congestion,
or to receive some other indication of congestion within the same
window of data, even if this packet is dropped or reordered in the
network. We consider two cases, when the packet is later
retransmitted, and when the packet is not later retransmitted.
In the first case, if the packet is either dropped or delayed, and
at some point retransmitted by the data sender, then the
retransmission is a result of a Fast Retransmit or a Retransmit
Timeout for either that packet or for some prior packet in the
same window of data. In this case, because the data sender
already has retransmitted this packet, we know that the data
sender has already responded to an indication of congestion for
some packet within the same window of data as the original packet.
Thus, even if the first transmission of the packet is dropped in
the network, or is delayed, if it had the CE codepoint set, and is
later ignored by the data receiver as an out- of-window packet,
this is not a problem, because the sender has already responded to
an indication of congestion for that window of data.
In the second case, if the packet is never retransmitted by the
data sender, then this data packet is the only copy of this data
received by the data receiver, and therefore arrives at the data
Bagnulo & Briscoe Expires April 2, 2017 [Page 19]
Internet-Draft ECN and TCP control packets September 2016
receiver as an in-window packet, regardless of how much the packet
might be delayed or reordered. In this case, if the CE codepoint
is set on the packet within the network, this will be treated by
the data receiver as a valid indication of congestion.
There are essentially three arguments for not ECT marking
retransmitted packets, namely, reliability, DoS attacks and over-
reaction to congestion. We address all of them next in order.
About reliability, as described in Section 4.1, we believe that the
bar should be that the congestion signal should be delivered as
reliably as if it was a packet drop. So, if a retransmitted packet
is dropped and this goes by unnoticed by the receiver, then the
congestion signal expressed as a drop would be lost. The same
applies to the congestion signal resulting from marking with ECT and
CE the very same retransmitted packet which later is dropped.
About the possibility of DoS attacks, the protection against the DoS
attack does not result from not allowing retransmitted packets to be
ECT marked. If an attacker decided to launch such an attack, it
would craft the packet with the ECT codepoint set. Effectively, the
protection against the described DoS attack comes from the
requirement that the receiver should not ignore the CE codepoint in
out-of-window packets. We proposed to allow ECT marking of
retransmitted packets, in order reduces the chances of it being
dropped, but keep the requirement to ignore the CE codepoint in out-
of-window packets.
Finally, the third argument is about over-reacting to congestion.
The argument goes that, if a retransmitted packet is dropped, the
sender will not detect it, so it will not react again to congestion
(it would have reduced its congestion window already when it
retransmitted the packet). Whereas, if retransmitted packets can be
CE tagged instead of dropped, senders could potentially react more
than once to congestion.
However, we argue that it is legitimate to respond again to
congestion if it still persists in subsequent round trip(s). So it
is not incorrect to set ECT on retransmissions.
4.5. Window probe packets
RFC3168 presents only the reliability argument for preventing setting
the ECT codepoint in Window Probe packets. Specifically, it states:
When the TCP data receiver advertises a zero window, the TCP data
sender sends window probes to determine if the receiver's window
has increased. Window probe packets do not contain any user data
Bagnulo & Briscoe Expires April 2, 2017 [Page 20]
Internet-Draft ECN and TCP control packets September 2016
except for the sequence number, which is a byte. If a window
probe packet is dropped in the network, this loss is not detected
by the receiver. Therefore, the TCP data sender MUST NOT set
either an ECT codepoint or the CWR bit on window probe packets.
However, because window probes use exact sequence numbers, they
cannot be easily spoofed in denial-of-service attacks. Therefore,
if a window probe arrives with the CE codepoint set, then the
receiver SHOULD respond to the ECN indications.
The reliability argument has been addressed in Section 4.1. dropping
the window probe message in the case the conditions for the Silly
Window Syndrome are on, basically implies that the sender will be
stalled until the new Window Probe message reaches the receiver,
which agains results in a performance penalty.
On the bright side, receivers should respond to ECN messages in these
packets, so changing the behaviour should be less painful than for
other packet types.
5. Security considerations
There are several security arguments presented in RFC 3168 for
preventing the ECN marking of TCP control packets and retransmitted
segments. We believe all of them have been properly addressed in
Section 4.
6. IANA Considerations
There are no IANA considerations in this memo.
7. Acknowledgments
TBD
8. Informative References
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
DOI 10.17487/RFC5562, June 2009,
<http://www.rfc-editor.org/info/rfc5562>.
Bagnulo & Briscoe Expires April 2, 2017 [Page 21]
Internet-Draft ECN and TCP control packets September 2016
[RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF
Recommendations Regarding Active Queue Management",
BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015,
<http://www.rfc-editor.org/info/rfc7567>.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, DOI 10.17487/RFC3540, June 2003,
<http://www.rfc-editor.org/info/rfc3540>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989,
<http://www.rfc-editor.org/info/rfc1122>.
[I-D.briscoe-tsvwg-aqm-tcpm-rmcat-l4s-problem]
Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
Low Loss, Scalable Throughput (L4S) Internet Service:
Problem Statement", draft-briscoe-tsvwg-aqm-tcpm-rmcat-
l4s-problem-02 (work in progress), July 2016.
[I-D.ietf-tcpm-accurate-ecn]
Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More
Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate-
ecn-01 (work in progress), June 2016.
[judd-nsdi]
Judd, G., "Attaining the promise and avoiding the pitfalls
of TCP in the Datacenter", NSDI 2015, 2015.
[ecn-pam] Brian, B., Mirja, M., Damiano, D., Iain, I., Gorry, G.,
and R. Richard, "Enabling Internet-Wide Deployment of
Explicit Congestion Notification", PAM 2015, 2015.
Authors' Addresses
Bagnulo & Briscoe Expires April 2, 2017 [Page 22]
Internet-Draft ECN and TCP control packets September 2016
Marcelo Bagnulo
Universidad Carlos III de Madrid
Av. Universidad 30
Leganes, Madrid 28911
SPAIN
Phone: 34 91 6249500
Email: marcelo@it.uc3m.es
URI: http://www.it.uc3m.es
Bob Briscoe
Simula Research Lab
Email: ietf@bobbriscoe.net
URI: http://bobbriscoe.net/
Bagnulo & Briscoe Expires April 2, 2017 [Page 23]