Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.
1) My general concern is that, while I don't necessarily want to block the proposed format, I would like to understand further before publication why this approach was chosen. Similar to Ben's discuss, I don't understand why the format was chosen so differently. You could just use the format (plus a new length option) as defined for UPD and just never have any retransmission or reordering but be more flexible on the lower layer transport to use. However, if you actually prefer a new format (to save space), than that sounds like a new version for me, while the draft says: "CoAP is defined in [RFC7252] with a version number of 1. At this time, there is no known reason to support version numbers different from 1." However, in this case it could even have made sense to define a new format/version that could be used for both underlying protocols and either have a length option or a message type and IP option. Further I also don't understand why on the other hand the TCP COAP framing is re-used for websockets because websockets already provides message framing and a length field. Also inline with Ben's discuss, the use of the Block option for CAOP/TCP is not very clear to me. The draft says: "a UDP-to-TCP gateway may simply not have the context to convert a message with a Block Option into the equivalent exchange without any use of a Block Option (it would need to convert the entire blockwise exchange from start to end into a single exchange)" However, given that the COAP/TCP and COAP/UDP format are so different, it's anyway a more complex conversion than just sticking another transport underneath. The argument for HOL blocking due to e.g. upgrades is also not clear to me because you should probably better just use a different TCP connection for that as it really seems to be a different use case. For me this draft looks like you are defining basically a new protocol version and not just COAP over TCP. Again, I don't necessarily want to block this but I would like to understand why the proposed approach was chosen. 2) Comments from the tsv-art review needs to be addressed as well (Thanks to Yoshi Nishida for the review!). Here is the review text for your connivence: "Summary: This document is well-written. It is almost ready to be published as a PS draft once the following points are addressed. 1: It is not clear how the protocol reacts the errors from transport layers (e.g. connection failure). The protocol will just inform apps of the events and the app will decide what to do or the protocol itself will do something? 2: There will be situations where the app layer is freezing while the transport layer is still working. Since transport layers cannot detect this type of failures, there should be some mechanisms for it somewhere in the protocol or in the app layer. The doc needs to address this point. For example, what will happen when a PONG message is not returned for a certain amount of time? 3: Since this draft defines new SZX value, I think the doc needs to update RFC7959. This point should be clarified more in the doc.“ 3) And inline with Yoshi's comment, I don't think this part in section 3.3 is well specified; especially I don't understand how these two thing fit together: "To avoid unnecessary latency, a Connection Initiator MAY send additional messages without waiting to receive the Connection Acceptor's CSM; ..." and "Endpoints MUST treat a missing or invalid CSM as a connection error and abort the connection (see Section 5.6)." Also how long should I wait until I abort the connection?
I strongly agree with Adam's point about default port for coaps+tcp URI scheme. Also the following comments from IANA should be looked at: While we have an approval from the well-known URI expert, we're still waiting for a response from the expert for ALPN Protocol IDs. Also, Graham Klyne, who's traveling, sent the response below to our request for a URI scheme review. When we register this URI scheme, can/should we add the note Graham proposes to the registry, and call it an "IESG Note"? thanks, Amanda Baber Lead IANA Services Specialist == I am concerned that these scheme registrations (with multiple schemes for the same resource accessed using a different protocol) present an "antipattern" that was controversial when a similar proposal was raised about 18 months ago; e.g. see this earlier comment from Roy Fielding: https://mailarchive.ietf.org/arch/msg/uri-review/ZXTfNQ7PDxHBSccrqrrbGH5N-Ko , specifically this: [[ A URI scheme should define what it names and how that naming maps to the URI syntax. There is nothing wrong with using separate schemes for different transports if those transports are essential parts of the name (e.g., if something named Fred at TCP:80 is different from something named Fred at UDP:89898). [...] In short, I think you need to better document what each URI scheme means from the perspective of a server and then what the client is expected to do with such a URI. ]] I was hoping that the URI-review list would pick up on this and provide some further discussion. I've seen a couple of private messages that seem to express similar concerns. In summary, I would have responded on the URI review list as an individual if I had been in a position to do. But if this comes back to me as a registration request that has passed WG last call, then I see no grounds for refusal, even though I think the design is misguided (or at least not adequately explained in the registration templates). In this situation, I might feel inclined to request adding an "IESG Note" to the registration along the following lines (if this is deemed acceptable for a request that has passed an IETF last-call review): [[ The CoAP protocol registers different URI schemes for accessing CoAP Resources via different protocol. This runs counter to the principle that a URI identifies a resource, and that multiple URIs for identifying the same resources should be avoided. URIs should be used to hide rather than expose the purely technical mechanisms used when accessing a resource. ]]
Update2: I've cleared my DISCUSS position, since version 09 removed the use of the scheme to select the transport. I would have liked to see a bit more precision from Adam's suggested design in the section on transport selection, but I don't consider that discuss-worthy. Update: I removed point 1 from my discuss. But I do still think it would be useful to add a paragraph about how COAP reliability features are only hop-by-hop. Subtantive: 3.2: I agree with Adam that this length scheme seems very complex for the return 3.3: Since the initiator can start sending messages before receiving a CSM from the responder, how long should the initiator wait for a CSM before bailing? 3.4: Can you offer any guidance about how often to send keep-alives? I note that these keepalives are not necessarily bi-directional. Aren't there some NAT/FW cases where bi-directional traffic is needed to keep bindings from timing out? This and other places explicitly mention that in-flight messages may be lost when the transport is closed or reset. This creates uncertainty about whether such messages have been processed or not. Is that really okay? 4: After the discussion resulting from Mark's Art-Art review, I expected to see more emphasis about WebSocket being intended for browser-based clients. There's a couple of in-passing mentions of browser-clients buried in the text; I would have expected something more up front. 4.2: Is it really worth making the framing code behave differently for WebSocket than for TCP? 5.3: Do I understand correctly that once an option is established, it cannot be removed unless replaced? (Short of tearing down the connection and starting over, anyway.) 7.2: The text mentions 443 as a default port, but really seems to make 5684 the default. If 443 is really a default, then this needs discussion about why and why it's okay to squat on HTTPS. The text about whether ALPN is required is confusing. Why not just require ALPN and move one, rather than special casing it by port choice? (There seems to be some circular logic about requiring 5685 to support clients that don't do ALPN, then saying clients MUST do ALPN unless they are using port 5685.) 7.3: I agree with Adam's DISCUSS comment. And even if people decide that the well-known bit can be specified in CORE, I think it does future users of a well-known URIs for ws a disservice to make them dig through this spec to find the update to 6455. It would be better to pull that into a separate draft. That's also a material addition post IETF last call, so we should consider repeating the LC. 10.2: Is the registration policy "analogous to" that of [RFC7252] S12.2, or "identical to" it. If the answer is not "identical", then the policy should be detailed here. Editorial: Figures 7 and 8: "Payload (if any)" - Can we assume that if one uses either extended length format, one has a payload? 3.3: Is the guidance about what errors to return if you don't implement a server any different here than for UDP? 4.3 and 4.4 seem to primarily repeat details that are the same for WS as for TCP, even though the introduction to the WS part says that it won't do that :-) 5.3: "One CSM MUST be sent by both endpoints...": s/both/each 7.6: The "updates" in this section are confusing. I understand this to mean that the procedures for TCP and WS are identical to those for UDP except for the mentioned steps. But the language of the form of "This step from [RFC7252] is updated to:" makes it sound like this intends to actually change the language in 7252 to this new language. If the latter, then that effectively removes UDP support from 7252 as updated. This could easily be fixed by changing that to something to the effect of "When using TCP, this step changes to ..." Appendix A: Why is this an appendix? Updates to a standards track RFC seem to warrant a more prominent position in the draft.
Watching all discussions
Agree with the concerns raised in the DISCUSSes, looking forward to their resolution.
I share a lot of the concerns raised in the DISCUSSes and I look forward to their resolution.
I have many of the same concerns as others, but see not need to hold a DISCUSS myself.
I agree with EKR's technical comment that MTI cipher suites need to be defined.
Document: draft-ietf-core-coap-tcp-tls-08.txt TECHNICAL You need to specify MTI cipher suites. I don't think that the ones you specified in 7925 are very useful for TLS. Is this really really what you want? S 3.2. Having the lengths offset by 13 bytes is, IMO, pretty silly. I realize it avoids duplication, but it also makes the packets hard to read for not much value. As a practical matter, it expands the 1-byte length for the range 256-268, for a savings of less than .5% even on those packets and on average far less. S 4.1. The WebSocket client MUST include the subprotocol name "coap" in the list of protocols, which indicates support for the protocol defined in this document. Any later, incompatible versions of CoAP or CoAP over WebSockets will use a different subprotocol name. This doesn't make much sense, because you are willing to have incompatible protocols for TCP, where you use CSM to distinguish them, and you do the same thing with ALPN. S 5.5. These release semantics seem quite problematic. In particular, when people want an orderly close, they typically want the other side to process all the outstanding requests and then return them, but this doesn't seem to do that (note that just because the responses need to be *delivered* in order doesn't mean they need to be generated in order). So, for instance, say I have the following sequence of events: C S DB GET /a -> Request A -> Release -> FIN <- Response A It seems like the only difference between Abort and Release is that the sender is saying "don't expect that I processed any of your messages", but in at least a lot of scenarios (e.g., where the initiator is basically just a client), this doesn't actually tell the server much about sequence because the responses aren't ordered wrt Release AFAICT. Release message by closing the TCP/TLS connection. Messages may be in flight when the sender decides to send a Release message. The general expectation is that these will still be processed. This is not really useful language. For CoAP over reliable transports, the recipient rejects such messages by sending an Abort message and otherwise ignoring the message. No specific option has been defined for the Abort message in this case, as the details are best left to a diagnostic payload. I don't understand this text. Abort seems to mean "I'm done", but then how am I ignoring the message. S 6. I found this section pretty confusing. In 7959, when M=0 you need to stay *under* the block boundary but here you say: In descriptive usage, a BERT Option is interpreted in the same way as the equivalent Option with SZX == 6, except that the payload is also allowed to contain a multiple of 1024 bytes (non-final BERT block) or more than 1024 bytes (final BERT block). And your examples pretty clearly show it being >> 1024. What's the reasoning here In control usage, a BERT option is interpreted in the same way as the equivalent Option with SZX == 6, except that it also indicates the capability to process BERT blocks. But: Block-wise Transfer Option. If a Max-Message-Size Option is indicated with a value that is greater than 1152 (in the same or a different CSM message), the Block-wise Transfer Option also indicates support for BERT (see Section 6). Subsequently, if the Max-Message- Is this an instruction to set the BTO to be 7? Or redundancy? EDITORIAL S 3.2. Length (Len): 4-bit unsigned integer. A value between 0 and 12 directly indicates the length of the message in bytes starting I think you want to say "0 and 12 inclusive" S 5.3.1. These are not default values for the option, as defined in Section 5.4.4 in [RFC7252]. A default value would mean that an empty Capabilities and Settings message would result in the option being set to its default value. This is pretty confusing text. I take it that it means that if the base values of both A and B are 0, then: Start // A=0, B=0 CSM[A=1] // A=1, B=0 CSM[B=2] // A=1, B=2 Whereas if these were default values, then this would be: Start // A=0, B=0 CSM[A=1] // A=1, B=0 CSM[B=2] // A=0, B=2 <- A resets to default If that's so, perhaps you could say: These are not default values for the option, as defined in Section 5.4.4 in [RFC7252], because default values apply on a per-message basis and thus reset when the value is not present in a given CSM.
I have removed my DISCUSS, but want to be clear that I remain quite distressed about the design aspects of this document. I have adjusted my comments below to trim them down to only those issues that remain in -09. Beyond the comments left over from -08, I am perplexed that no concrete mechanism for UDP/TCP failover is provided, nor is any discussion of the management aspects of configuring between them, nor is any discussion of which transport protocol(s) may be considered MTI. I also wish to highlight the somewhat buried request from my original comments that I believe this document would be vastly improved by splitting it into one document that deals with TCP, and another that deals with WebSockets. They are intended for radically different environments, and a large majority of implementors will care about one but not the other. Combining into a single document just creates more work for them. General — this is a very bespoke approach to what could have been mostly solved with a single four-byte “length” header; it is complicated on the wire, and in implementation; and the format variations among CoAP over UDP, TLS, and WebSockets are going to make gateways much harder to implement and less efficient (as they will necessarily have to disassemble messages and rebuild them to change between formats). The protocol itself mentions gateways in several places, but does not discuss how they are expected to map among the various flavors of CoAP defined in this document. Some of the changes seem unnecessary, but it could be that I’m missing the motivation for them. Ideally, the introduction would work harder at explaining why CoAP over these transports is as different from CoAP over UDP as it is, focusing in particular on why the complexity of having three syntactically incompatible headers is justified by the benefits provided by such variations. Additionally, it’s not clear from the introduction what the motivation for using the mechanisms in this document is as compared to the techniques described in section 10 (and its subsections) of RFC 7252. With the exception of subscribing to resource state (which could be added), it seems that such an approach is significantly easier to implement and more clearly defined than what is in this document; and it appears to provide the combined benefits of all four transports discussed in this document. My concern here is that an explosion of transport options makes it less likely that a client and server can find two in common: the limit of the probability of two implementations having a transport in common as the number of transports approaches infinity is zero. Due to this likely decrease in interoperability, I’d expect to see some pretty powerful motivation in here for defining a third, fourth, fifth, and sixth way to carry CoAP when only TCP is available (I count RFC 7252 http and https as the first and second ways in this accounting). Specific comments follow. Section 3.3, paragraph 3 says that an initiator may send messages prior to receiving the remote side’s CSM, even though the message may be larger than would be allowed by that CSM. What should the recipient of an oversized message do in this case? In fact, I don’t see in here what a recipient of a message larger than it allowed for in its CSM is supposed to do in response at *any* stage of the connection. Is it an error? If so, how do you indicate it? Or is the Max-Message-Size option just a suggestion for the other side? This definitely needs clarification. (Aside — it seems odd and somewhat backwards that TCP connections are provided an affordance for fine-grained control over message sizes, while UDP communications are not.) Section 5 and its subsections define a new set of message types, presumably for use only on connection-oriented protocols, although this is only implied, and never stated. For example, some implementors may see CSM, Ping, and Pong as potentially useful in UDP; and, finding no prohibition in this document against using them, decide to give it a go. Is that intended? If not, I strongly suggest an explicit prohibition against using these in UDP contexts. Section 5.3.2 says that implementations supporting block-wise transfers SHOULD indicate the Block-wise Transfer Option. I can't figure out why this is anything other than a "MUST". It seems odd that this document would define a way to communicate this, and then choose to leave the communicated options as “YES” and “YOUR GUESS IS AS GOOD AS MINE” rather than the simpler and more useful “YES” and “NO”. I find the described operation of the Custody Option in the operation of Ping and Pong to be somewhat problematic: it allows the Pong sender to unilaterally decide to set the Custody Option, and consequently quarantine the Pong for an arbitrary amount of time while it processes other operations. This seems impossible to distinguish from a failure-due-to-timeout from the perspective of the Ping sender. Why not limit this behavior only to Ping messages that include the Custody Option? I am similarly perplexed by the hard-coded “must do ALPN *unless* the designated port takes the magical value 5684” behavior. I don’t think I’ve ever seen a protocol that has such variation based on a hard-coded port number, and it seems unlikely to be deployed correctly (I’m imaging the frustration of: “I changed both the server and the client configuration from the default port of 5684 to 49152, and it just stopped working. Like, literally the *only* way it works is on port 5684. I've checked firewall settings everywhere and don't see any special handling for that port -- I just can't figure this out, and it's driving me crazy.”). Given the nearly universal availability of ALPN in pretty much all modern TLS libraries, it seems much cleaner to just require ALPN support and call it done. Or *don’t* require ALPN at all and call it done. But *changing* protocol behavior based on magic port numbers seems like it’s going to cause a lot of operational heartburn. [I have removed my comments about section 8.1, as I believe EKR is managing the TLS-related issues for this document] Although the document clearly expects the use of gateways and proxies between these connection-oriented usages of CoAP and UDP-based CoAP, Appendix A seems to omit discussion or consideration of how this gatewaying can be performed. The following list of problems is illustrative of this larger issue, but likely not exhaustive. (I'll note that all of these issues evaporate if you move to a simpler scheme that merely frames otherwise unmodified UDP CoAP messages) Section A.1 does not indicate what gateways are supposed to do with out-of-order notifications. The TCP side requires these to be delivered in-order; so, do this mean that gateways observing a gap in sequence numbers need to quarantine the newly received message so that it can deliver the missing one first? Or does it deliver the newly-received message and then discard the “stale” one when it arrives? I don’t think that leaving this up to implementations is particularly advisable. Section A.3 is a bit more worrisome. I understand the desired optimization here, but where you reduce traffic in one direction, you run the risk of exploding it in the other. For example, consider a coap+tcp client connecting to a gateway that communicates with a CoAP-over-UDP server. When that client wants to check the health of its observations, it can send a Ping and receive a Pong that confirms that they are all alive and well. In order to be able to send a Pong that *means* “all your observations are alive and well,” the gateway has to verify that all the observations are alive and well. A simple implementation of a gateway will likely check on each observed resource individually when it gets a Ping, and then send a Pong after it hears back about all of them. So, as a client, I can set up, let’s say, two dozen observations through this gateway. Then, with each Ping I send, the gateway sends two dozen checks towards the server. This kind of message amplification attack is an awesome way to DoS both the gateway and the server. I believe the document needs a treatment of how UDP/TCP gateways handle notification health checks, along with techniques for mitigating this specific attack. Section A.4 talks about the rather different ways of dealing with unsubscribing from a resource. Presumably, gateways that get a reset to a notification are expected to synthesize a new GET to deregister on behalf of the client? Or is it okay if they just pass along the reset, and expect the server to know that it means the same thing as a deregistration? Without explicit guidance here, I expect server and gateway implementors to make different choices and end up with a lack of interop. From i-d nits (this appears to be in reference to Figure 1): ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72.