Last Call Review of draft-ietf-detnet-architecture-08

Request Review of draft-ietf-detnet-architecture
Requested rev. no specific revision (document currently at 13)
Type Last Call Review
Team Transport Area Review Team (tsvart)
Deadline 2018-10-03
Requested 2018-09-19
Authors Norman Finn, Pascal Thubert, Balazs Varga, János Farkas
Draft last updated 2018-09-28
Completed reviews Rtgdir Last Call review of -08 by Henning Rogge (diff)
Secdir Last Call review of -08 by Dan Harkins (diff)
Genart Last Call review of -08 by Joel Halpern (diff)
Tsvart Last Call review of -08 by Michael Scharf (diff)
Tsvart Telechat review of -11 by Michael Scharf (diff)
Genart Telechat review of -11 by Joel Halpern (diff)
Assignment Reviewer Michael Scharf
State Completed
Review review-ietf-detnet-architecture-08-tsvart-lc-scharf-2018-09-28
Reviewed rev. 08 (document currently at 13)
Review result Ready with Issues
Review completed: 2018-09-28


The document "Deterministic Networking Architecture" (draft-ietf-detnet-architecture-08) defines an overall framework for Deterministic Networking.

As TSV-ART reviewer, I believe that this document has issues as detailed below.


Major issues:

* It seems that DetNet cannot easily be deployed in the Internet without additional means. Thus, for a baseline document, one could expect some explanation on the requirements of deploying DetNet in a network. DetNet basically requires support in (almost) all network devices transporting DetNet traffic. That assumption should be explicitly spelt out early in the document, e.g., in the introduction. There also needs to be an explicit discussion of the implications if not the whole network is aware of or supports DetNet. There is some text in Section 4.2.2 and Section 4.3.3, but I believe additional explicit discussion is needed at a prominant place. For instance, can use of DetNet do harm to parts of a network not supporting DetNet? As a side note, when TCPM published RFC 8257, the following disclaimer was added: "DCTCP, as described in this specification, is applicable to deployments in controlled environments like data centers, but it must not be deployed over the public Internet without additional measures." I wonder if a similar disclaimer is needed for DetNet. If there is an implicit assumption that DetNet will  be used in homogenous environments with mostly DetNet-aware devices within the same organization, such an assumption should be made explicit.

* It is surprising that there is hardly any discussion on network robustness and safety; this probably also relates to security. For instance, misconfiguration or errors of functions performing packet replication could severely and permantly congest a network and cause harm. How does the DetNet architecture ensure that a network stays fully operational e.g. if the topology changes or there are equipment failures? Probably this can be solved by implementations (e.g., dynamic control plane), but why are corresponding requirements not spelt out? Section 3.3.2 speculates that filters and policers can help, and that may be true, but that probably still assumes consistently and correctly configured (and well-behaving) devices. And Section 3.3.2 is vague and mentions a "infinite variety of possible failures" without stating any requirements or recommendations. There may be further solutions, such as circuit breakers and the like. Why are such topics not discussed?

* Somewhat related, the document only looks at impact of failures to the QoS of DetNet traffic. What is missing is a discussion how to protect non-DetNet parts of a network from any harm caused by DetNet mechanisms. Solutions to this probably exist. But why is the impact on non-DetNet traffic (e.g., in case of topology changes or failures of DetNet functions) not discussed at all in the document? 

* Regarding security, an architecture like DetNet probably requires that only authenticated and authorized end systems have access to the data plane. The security considerations only briefly mention the control aspect ("the authentication and authorization of the controlling systems").

* For an architecture document, the lack of clarity and consistency regarding terminology is concerning. This specifically applies to the case of incomplete networks (as per Section 4.2.2 and 4.3.3) that include "DetNet-unaware nodes". The document introduces terms such as "DetNet intermediate nodes" but then repeatedly uses generic terms such as "node" or "hop" that may include DetNet-unaware nodes. For instance, for incomplete networks, a sentence such as "The primary means by which DetNet achieves its QoS assurances is to reduce, or even completely eliminate, congestion within a node as a cause of packet loss" seems to only apply to "DetNet transit nodes" but not "DetNet-unaware nodes". Similar ambiguity exist for other use of the terms "hop" and "node", which may or may not include DetNet-unaware nodes. It is unclear why the document does not consistently use the terminology introduced in Section 2.1 in all sections and clearly distinguishes cases with and without DetNet support.

* Section 4.4 refers to RFC 7426, which is an informational RFC on IRTF stream, and the document uses the concepts introduced there (e.g., "planes"). This is very confusing. First, an IETF Proposed Standard should probably refer to documents having IETF consensus. An example would be RFC 7491, albeit there is other related work as well, e.g., in the TEAS WG. Second, Section 4.4 is by and large decoupled from the rest of the document and not specific to DetNet. Neither do other sections of the document refer to the concepts introduced in Section 4.4, nor does Section 4.4 use the DetNet terminology or discuss applicability to DetNet. Section 4.4 even mentions explicitly at the end that it discusses aspects that are orthogonal to the DetNet architecture. It is not at all clear why Section 4.4 is in this document. Section 4.4 could be removed from the document without impacting the rest of the document.

Minor issues:

* Terminology "DetNet transport layer"

  The term "transport layer" has a well-defined meaning in the IETF, e.g. originating from RFC 1122. While "transport" and e.g. "transport network" is used in the IETF for different technologies in different areas, I think the term "transport layer" is typically understood to refer to transport protocols such as TCP and UDP. As such, I personally find the term "DetNet transport layer" misleading and confusing. The confusion is easy to see e.g. in Figure 4, where UDP (which is a transport protocol as per RFC 1122) sits on top of "transport".

  Based on the document it also may be solution/implementation specific whether the "DetNet transport layer" is actually a separate protocol layer compared to the "DetNet service layer". Thus it is not clear to me why the word "layer" has to be used, specifically in combination "transport layer".

  To me as, the word "transport layer" (and "transport protocol") should be used for protocols defined in TSV area, consistent with RFC 1122. But this is probably a question to be sorted out by the IESG.

* Page 9

   A DetNet node may have other resources requiring allocation and/or

  This is just one of several examples for inconsistent use of terminology. What is a "DetNet node"? That term is not introduced in Section 2.1

* Page 14

   A DetNet network supports the dedication of a high proportion (e.g.
   75%) of the network bandwidth to DetNet flows. 

  The 75% value is not reasoned. What prevents using 99% of the bandwidth for DetNet traffic?

* Page 15: Figure 2

  If the term "transport layer" cannot be avoided, the labels in this figure should at least be expanded to "DetNet transport layer".

* Page 18: Figure 4

  As already mentioned earlier, Figure 4 is confusing. UDP is a transport protocol. If the term "transport" cannot be avoided, the labels in this figure should at least be expanded to "DetNet transport".

* Page 23

   If the source transmits less data than this limit
   allows, the unused resource such as link bandwidth can be made
   available by the system to non-DetNet packets.

  Could there be additional requirements on the use of unused resources by non-DetNet packets, e.g., regarding preemption? I am just wondering... If that was possible, a statement like "... can be made available by the system to non-DetNet packets as long as all guarantees are fulfilled" would be on the safe side, no?

* Page 27:

   DetNet achieves congestion protection and bounded delivery latency by
   reserving bandwidth and buffer resources at every hop along the path
   of the DetNet flow.

  Why does this sentence use the word "hop"? As far as I understand, in DetNet bandwidth and buffer resources are reserved in each DetNet intermediate node. If there were hops over IP routers not being DetNet intermediate nodes, no resources would be reserved there. As per Section 4.3.3, it is possible to deploy DetNet this way. And obviously there can be resource bottlenecks below IP, on devices that are not routers... So does "hop" here refer to IP router hops or also to devices not processing IP (or IP/MPLS)?

* Page 27:

   Standard queuing and transmission selection algorithms allow a
   central controller to compute the latency contribution of each
   transit node to the end-to-end latency, ...

  The text does not explain why a _central_ controller is needed for this computation. Why would a distributed control plane not be able to realize this computation. Isn't this implementation-specific?

* Page 32

  To somebody who is not deeply familiar with DetNet, it is impossible to parse the description of the examples in Section 4.7.3. For instance, "VID + multicast MAC address" is not introduced. I think this example must be expaned with additional context and explanation to be useful to readers.

* Page 34

   There are three classes of information that a central controller or
   distributed control plane needs to know that can only be obtained
   from the end systems and/or nodes in the network.

  Wouldn't it be sufficient to state "Provisioning of DetNet requires knowledge about ...". Does it matter in this context whether the provisioning is done by a central controller or a distributed control plane? For instance, could the same paragraph also apply to a network that uses _multiple_ central controllers, or hybrid combinations of central controllers and distributed control planes? In general, an architecture document should be agnostic to implementation aspects unless there is a specific need. In this specific case, I fail to see a need to discuss the realization of the control plane of a network.

Editorial nits:

* Page 9:

   The low-level mechanisms described in Section 4.5 provide the
   necessary regulation of transmissions by an end system or
   intermediate node to provide congestion protection.  The allocation
   of the bandwidth and buffers for a DetNet flow requires provisioning
   A DetNet node may have other resources requiring allocation and/or
   scheduling, that might otherwise be over-subscribed and trigger the
   rejection of a reservation.

  Probably a full stop is missing after "provisioning".

* Page 11: "... along separate (disjoint non-SRLG) paths ..."

  I find this confusing. I would understand e.g. "along separate (SRLG-disjoint) paths".

* Page 34:

   When using a peer-
   to-peer control plane, some of this information may be required by a
   system's neighbors in the network.

  Would "acquired" be a better term?

* Page 34:

   o  The identity of the system's neighbors, and the characteristics of
      the link(s) between the systems, including the length (in
      nanoseconds) of the link(s).

  "Latency" or "delay" would probably be a better terms if the value is measured in nanoseconds.

* Page 35:

   DetNet is provides a Quality of Service (QoS), and as such, does not
   directly raise any new privacy considerations.

  Broken sentence

* Please expand acronyms on first use (e.g., OTN)