Data Center TCP (DCTCP): TCP Congestion Control for Data Centers
RFC 8257
Yes
No Objection
Note: This ballot was opened for revision 07 and is now closed.
Alvaro Retana No Objection
Several nits: - The Abstract says that "This memo documents existing DCTCP implementations ([WINDOWS], [LINUX], [FREEBSD])..." But in reality it doesn't, it just points to those references that presumably contain implementation information. The [WINDOWS] reference is only used in the Abstract -- last I looked, there shouldn't be references there [rfc7322]. - "...and deployment experience ([MORGANSTANLEY])." Again, this draft doesn't document deployment experience, just points at it. - rfc7942 recommends that the Implementation Status section be removed. If the intent is to keep it, then consider putting a note so that the RFC Editor doesn't remove it.. - The fact that this document describes the Microsoft Windows Server 2012 implementation should be made clear from the start (in the Introduction). You could then also get rid of the extra text in Section 2. - The reference to [RFC3168-ERRATA3639] seems strange to me...not because it is pointing to the report, but because it is Informative, when the reference to RFC3168 is Normative. I would assume that because Errata3639 has been Verified, then it means it is now "part of" RFC3168, so I would think that there's no need to mention it separately...
(Mirja Kühlewind; former steering group member) Yes
(Spencer Dawkins; former steering group member) Yes
I think Alvaro's nits in his ballot are worth a look, but I'm really glad to see this work moving forward, and wanted to thank the authors for a clear explanation of a TCP mechanism that I think I could implement myself.
(Terry Manderson; former steering group member) Yes
Thank you for a well constructed document, and (IMHO) a nice approach to the issue. I noticed a few nits (some caught by Alvaro) and others that are either me reading late at night or typographical concerns (such as "If SYN , SYN-ACK" [comma placement] first line of section 3.5 - so please give it a thorough read through)
(Adam Roach; former steering group member) No Objection
Given the nature of this mechanism, I would have expected some qualitative analysis of its performance under typical data center conditions, rather than the somewhat vague descriptions of it being an "improvement." If the cited literature contains such numbers, I would suggest (a) specifically citing where such data can be found; and (b) copying a very high-level summary into this document (e.g., something like: "Under typical data center load conditions, intra-center transfers of large (muti-gigabyte) files were improved by approximately 12% over Standard TCP using commodity switches in their default configuration. See [REFERENCE] for details.") Please expand the following acronyms upon first use; see https://www.rfc-editor.org/materials/abbrev.expansion.txt for guidance. - L3 - Level 3 - ECT - ECN-Capable Transport - DSCP - Differentiated Services Code Point - AQM - Active Queue Management - RED - Random Early Detection
(Alexey Melnikov; former steering group member) No Objection
(Alia Atlas; former steering group member) No Objection
(Alissa Cooper; former steering group member) No Objection
(Ben Campbell; former steering group member) No Objection
Substantive Comments: - General: The purpose of this draft is not clear to me. Is the point to document the Microsoft implementation just for people's information? Do you have hopes other people will implement this? As written, this seems like a case of an informational draft defining protocol. That's not necessarily a problem, but it's helpful to put a paragraph near the beginning to describe why this is being published and what expectations people have of the outcome. (If the answer is along the lines of "We'd like people to implement this so we can get more operational experience", then I will wonder why the status was not "experimental".) -1, last paragraph: I assume this means that all participants need to live in the datacenter, right? That is, no flows where only one end lives in the datacenter? (I think you clarify that later, but it would be helpful to state it here.) - 3.3: first paragraph: Why not MUST? -- "The congestion estimator on the sender SHOULD process acceptable ACKs as follows:" Why not MUST? Nits: - 1: Can you offer a citation for MapReduce? - 2: The additional text assumes the usage of 2119 keywords here do not quite map to the 2119 definitions. -- "but even compliant implementations without the measures in sections 4-6 would still only be safe to deploy in controlled environments.": That seems too important of a statement to be buried in the terminology section. - 4.1: Citation for NewReno?
(Benoît Claise; former steering group member) No Objection
I have not seen any reply to Joe Clarke's OPS DIR review: Hello, WG and authors. I have reviewed rev -07 of the draft-ietf-tcpm-dctcp as requested by the OPS-DIR. This review focuses on improving operational aspects as well as any nits found in the text. This document is an informational draft that describes Data Center TCP (DCTCP), a congestion control mechanism for TCP in Data Center environments. Overall, I believe this document to be ready, with some nits and perhaps small areas for improved clarity and readability. First, I'd like to say that I appreciate the fact that this has been implemented on a number of kernels, and the authors included real-world implementation results and thoughts. From an operational perspective, that is very helpful. I also appreciated the fact that there are interoperability challenges, and those were called out in the document. My specific comments are below. There are a lot of abbreviations, variables and other terminology used throughout this document. It might be helpful for the reader to have an expanded terminology section at the top that one can refer to for all of these things. Some of the abbreviations are called out in the description of the algorithm, but not all (e.g., DCTCP.Alpha, CWR, RTT, etc.). === Section 3.2: You refer to DCTCP.Alpha before defining it. While you refer to Section 3.3 here, the impact of an incorrect Alpha value is not fully appreciated in this text. Perhaps this could be changed to reflect the impact the incorrect Alpha value would have? === Section 3.2: My abbreviating DCTCP.CE as CE in your state machine diagram, it is a bit confusing as to the difference between CE and DCTCP.CE. The description of the state machine above requires the CE codepoint to have a certain value in order for DCTCP.CE to change. Perhaps you can use D.CE as an abbreviation to be a bit clearer here. === Section 3.3: It is not clear if 'g' can be inclusive of 0 and 1. === Section 3.3: You define DCTCP.WindowEnd as the threshold for beginning a new observation window, but maybe to complement the state variable name, you should define it as the following: The TCP sequence number threshold when one observation window ends and other is to begin; initialized to SND.UNA. === Section 3.3: You state: Thus, when no bytes sent experienced congestion, DCTCP.Alpha equals zero, and cwnd is left unchanged But if I use a value of 1/16 for g, with DCTCP.Alpha initialized to 1 as you say, I get a value of DCTCP.Alpha == 15/16 when there is no congestion (i.e., M == 0). === Section 3.5: You have an extra space here before the comma: If SYN , SYN-ACK and RST packets for DCTCP connections have ECT set This should be: If SYN, SYN-ACK and RST packets for DCTCP connections have ECT set === Section 3.5: You do not define ECT before using it. === Section 4.1: Can you provide a reference for NewReno? === Section 5: Can you reference or define AQM and RED?
(Deborah Brungard; former steering group member) No Objection
(Eric Rescorla; former steering group member) No Objection
(Kathleen Moriarty; former steering group member) No Objection
(Suresh Krishnan; former steering group member) No Objection