Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.
Sorry for the late input, but based on the additional TSV review provided by Martin Stiemerling (Thanks!), I got convenienced that I would like to discuss the TCP related parts of this document further before publication (even though this is "only" an informational doc). I agree with the TSV review that the solution approaches discussed in 7.1 and 7.2 are slightly speculative and should therefore probably not be published in an RFC without further discussions in respective other groups of the IETF. Per-packet/flowlet path switching (7.1) will have an impact on the TCP machinery and should be further discussed in a tsv group before it would be presented as a solution approach in an RFC. Performance-aware routing (7.2) is actually a hard problem as congestion state is changing very dynamically and an attempt to utilize this information on a different time-scale than TCP does can lead to unwanted interfere and interdependencies. We currently have a proposed research group (PANRG) for this sort of problems, and this group would probably a better place for discussing these problems and proposed solutions (instead of an RFC-to-be). The easiest way to address my concerns is probably to removed TCP-related paragraph from section 3 as well as remove section 7.1 and 7.2 entirely and follow on those discussions in tsv area/tcpm and panrg instead.
I have a question regarding this part in section 3: "The absence of path visibility leaves transport protocols, such as TCP, with a "blackbox" view of the network. Some TCP metrics, such as SRTT, MSS, CWND and few others could be inferred and cached based on past history, but those apply to destinations, regardless of the path that has been chosen to get there. Thus, for instance, TCP is not capable of remembering "bad" paths, such as those that exhibited poor performance in the past. This means that every new connection will be established obliviously (memory- less) with regards to the paths chosen before, or chosen by other nodes." Is that actually a well-known problem? This is not fully clear to me. Because given that usually all paths in a data center network have roughly the same characteristics (at least regarding the cached values such as SRTT and MSS) caching of TCP parameters should not be a problem in symmetric topologies like Clos. Or do you have any specific corner cases in mind?
OPS DIR review from Tina: I found this document well written to be READY for publication as an informational document. Some nits: 4.2 eBGP Labeled Unicast (RFC8277) Each node peers with its neighbors via a eBGP session should be Each node peers with its neighbors via an eBGP session 7. Addressing the open problems the same could be re-used in context of other domains as well A period is missing in the end. Are the centralized controller and centralized agent the same components? Even though the design in this document is specified for same domain, it would be useful to develop an approach for inter-domain without leaking intra-domain topology and policy. Have this feature been included or being aligned with carrier grade FIB in FD.io VPP https://wiki.fd.io/view/VPP ?
I spent a long time trying to understand the following text from section 2, where the sub-bullet appears to flatly contradict its parent bullet: o Each node is its own AS (Node X has AS X). 4-byte AS numbers are recommended ([RFC6793]). * For simple and efficient route propagation filtering, Node5, Node6, Node7 and Node8 use the same AS, Node3 and Node4 use the same AS, Node9 and Node10 use the same AS. After a great deal of study of these and the following bullets, I convinced myself (perhaps incorrectly?) that the intention here is to say "We're going to talk about these nodes as if they each have their own AS, although in real deployments they'll probably be grouped together." Is that the intention? If so, it would be much easier to read if the sub-bullet made this clearer.