Data Center TCP (DCTCP): TCP Congestion Control for Data Centers

Note: This ballot was opened for revision 07 and is now closed.

Spencer Dawkins Yes

Comment (2017-06-20 for -07)
I think Alvaro's nits in his ballot are worth a look, but I'm really glad to see this work moving forward, and wanted to thank the authors for a clear explanation of a TCP mechanism that I think I could implement myself.

Mirja K├╝hlewind Yes

Terry Manderson Yes

Comment (2017-06-20 for -07)
Thank you for a well constructed document, and (IMHO) a nice approach to the issue. I noticed a few nits  (some caught by Alvaro) and others that are either me reading late at night or typographical concerns (such as "If SYN , SYN-ACK" [comma placement] first line of section 3.5 - so please give it a thorough read through)

Alia Atlas No Objection

Deborah Brungard No Objection

Ben Campbell No Objection

Comment (2017-06-21 for -07)
Substantive Comments:

- General: The purpose of this draft is not clear to me. Is the point to document the Microsoft implementation just for people's information? Do you have hopes other people will implement this? As written, this seems like a case of an informational draft defining protocol. That's not necessarily a problem, but it's helpful to put a paragraph near the beginning to describe why this is being published and what expectations people have of the outcome. (If the answer is along the lines of "We'd like people to implement this so we can get more operational experience", then I will wonder why the status was not "experimental".)

-1, last paragraph: I assume this means that all participants need to live in the datacenter, right? That is, no flows where only one end lives in the datacenter? (I think you clarify that later, but it would be helpful to state it here.)

- 3.3: first paragraph: Why not MUST?
-- "The congestion estimator on the sender SHOULD process acceptable ACKs
   as follows:" Why not MUST?


- 1: Can you offer a citation for MapReduce?

- 2: The additional text assumes the usage of 2119 keywords here do not quite map to the 2119 definitions. 
-- "but even compliant implementations without the measures in sections 4-6 would still only be safe to deploy in controlled environments.":  That seems too important of a statement to be buried in the terminology section.

- 4.1: Citation for NewReno?

Benoit Claise No Objection

Comment (2017-06-22 for -07)
I have not seen any reply to Joe Clarke's OPS DIR review:

Hello, WG and authors.  I have reviewed rev -07 of the draft-ietf-tcpm-dctcp as
requested by the OPS-DIR.  This review focuses on improving operational aspects
as well as any nits found in the text.

This document is an informational draft that describes Data Center TCP (DCTCP),
a congestion control mechanism for TCP in Data Center environments.

Overall, I believe this document to be ready, with some nits and perhaps small
areas for improved clarity and readability.  First, I'd like to say that I
appreciate the fact that this has been implemented on a number of kernels, and
the authors included real-world implementation results and thoughts.  From an
operational perspective, that is very helpful.  I also appreciated the fact
that there are interoperability challenges, and those were called out in the
document.  My specific comments are below.

There are a lot of abbreviations, variables and other terminology used
throughout this document.  It might be helpful for the reader to have an
expanded terminology section at the top that one can refer to for all of these
things.  Some of the abbreviations are called out in the description of the
algorithm, but not all (e.g., DCTCP.Alpha, CWR, RTT, etc.).


Section 3.2:

You refer to DCTCP.Alpha before defining it.  While you refer to Section 3.3
here, the impact of an incorrect Alpha value is not fully appreciated in this
text.  Perhaps this could be changed to reflect the impact the incorrect Alpha
value would have?


Section 3.2:

My abbreviating DCTCP.CE as CE in your state machine diagram, it is a bit
confusing as to the difference between CE and DCTCP.CE.  The description of the
state machine above requires the CE codepoint to have a certain value in order
for DCTCP.CE to change.  Perhaps you can use D.CE as an abbreviation to be a
bit clearer here.


Section 3.3:

It is not clear if 'g' can be inclusive of 0 and 1.


Section 3.3:

You define DCTCP.WindowEnd as the threshold for beginning a new observation
window, but maybe to complement the state variable name, you should define it
as the following:

The TCP sequence number threshold when one observation window ends and other is
to begin; initialized to SND.UNA.


Section 3.3:

You state:

Thus, when no bytes sent experienced congestion, DCTCP.Alpha equals
zero, and cwnd is left unchanged

But if I use a value of 1/16 for g, with DCTCP.Alpha initialized to 1 as you
say, I get a value of DCTCP.Alpha == 15/16 when there is no congestion (i.e., M
== 0).


Section 3.5:

You have an extra space here before the comma:

If SYN , SYN-ACK and RST packets for DCTCP connections have ECT set

This should be:

If SYN, SYN-ACK and RST packets for DCTCP connections have ECT set


Section 3.5:

You do not define ECT before using it.


Section 4.1:

Can you provide a reference for NewReno?


Section 5:

Can you reference or define AQM and RED?

Alissa Cooper No Objection

Suresh Krishnan No Objection

Alexey Melnikov No Objection

Kathleen Moriarty No Objection

Eric Rescorla No Objection

Alvaro Retana No Objection

Comment (2017-06-20 for -07)
Several nits:

- The Abstract says that "This memo documents existing DCTCP implementations ([WINDOWS], [LINUX], [FREEBSD])..."  But in reality it doesn't, it just points to those references that presumably contain implementation information.  The [WINDOWS] reference is only used in the Abstract -- last I looked, there shouldn't be references there [rfc7322].

- "...and deployment experience ([MORGANSTANLEY])."  Again, this draft doesn't document deployment experience, just points at it.

- rfc7942 recommends that the Implementation Status section be removed.  If the intent is to keep it, then consider putting a note so that the RFC Editor doesn't remove it..

- The fact that this document describes the Microsoft Windows Server 2012 implementation should be made clear from the start (in the Introduction).  You could then also get rid of the extra text in Section 2.

- The reference to [RFC3168-ERRATA3639] seems strange to me...not because it is pointing to the report, but because it is Informative, when the reference to RFC3168 is Normative.  I would assume that because Errata3639 has been Verified, then it means it is now "part of" RFC3168, so I would think that there's no need to mention it separately...

Adam Roach No Objection

Comment (2017-06-21 for -07)
Given the nature of this mechanism, I would have expected some qualitative analysis of its performance under typical data center conditions, rather than the somewhat vague descriptions of it being an "improvement." If the cited literature contains such numbers, I would suggest (a) specifically citing where such data can be found; and (b) copying a very high-level summary into this document (e.g., something like: "Under typical data center load conditions, intra-center transfers of large (muti-gigabyte) files were improved by approximately 12% over Standard TCP using commodity switches in their default configuration. See [REFERENCE] for details.")

Please expand the following acronyms upon first use;
see for guidance.

 - L3 - Level 3
 - ECT - ECN-Capable Transport
 - DSCP - Differentiated Services Code Point
 - AQM - Active Queue Management
 - RED - Random Early Detection