Last Call Review of draft-ietf-tls-rfc4347-bis-
review-ietf-tls-rfc4347-bis-secdir-lc-kaufman-2010-12-16-00

Request Review of draft-ietf-tls-rfc4347-bis
Requested rev. no specific revision (document currently at 06)
Type Last Call Review
Team Security Area Directorate (secdir)
Deadline 2010-12-17
Requested 2010-11-30
Other Reviews
Review State Completed
Reviewer Charlie Kaufman
Review review-ietf-tls-rfc4347-bis-secdir-lc-kaufman-2010-12-16
Posted at http://www.ietf.org/mail-archive/web/secdir/current/msg02258.html
Last updated 2010-12-16

Review
review-ietf-tls-rfc4347-bis-secdir-lc-kaufman-2010-12-16

I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG.  These comments were written primarily for the benefit of the security area directors.  Document editors and WG chairs should treat these comments just like any other last call comments.

This spec is a refresh of rfc4347, which specified DTLS v1.0 as a set of deltas from TLS v1.1. This spec defines DTLS v1.2 as a set of deltas from TLS v1.2. The deltas are mostly the same, so this spec is nearly identical to rfc4347 except that it adds some clarifications, updates the references, and changes the version number. It would be nice to have a structure where if and when TLS v1.3 appears, there would not be a need for a DTLS v1.3 spec. Unfortunately, since there might at that time be a need for some DTLS specific changes, there appears to be no way to do such a spec in advance. I've never looked at DTLS in detail before, so this is a review with fresh eyes. (That means please forgive me if I raise issues that were long debated and finally closed on the mailing list). I found what appears to be a minor flaw in the protocol (where it hangs if the wrong packet is lost), and some suspicious things in the spec.

The spec doesn't specify the changes from DTLS v1.0 and DTLS v1.2 and the implications for interoperability. This would be a section that was not needed in rfc4347. I assume the transition is smooth, picking up the version number negotiation from TLS v1.2, but it would be worth mentioning whether there are any known issues.

Section 3.2.2. says that DTLS queues up out of order packets for future processing. The protocol is designed so that it can alternatively drop out of order packets (since they will be retransmitted). It's a space/bandwidth trade-off (as noted in section 4.2.2).

The next-to-last paragraph of section 3.2.1 says that on a timeout, the client retransmits the unacknowledged handshake message and (if it was the response that was lost) the server will retransmit its response. It should be noted that the server's response must be bit-for-bit identical to the response it previously sent (since otherwise fragmentation could interleave parts of two responses). The protocol depends on the HelloVerifyRequest being short enough to fit in a single packet because it cannot reliably recover if that message is fragmented and a fragment is lost.

That retransmission strategy does not work on the last message of the protocol (the client's Finished) in the session-resuming exchange since the client is not expecting a response. As specified, I believe the protocol is broken in the case where that packet is lost. The obvious way to fix the protocol would be to add a fourth message to the session-resuming handshake. An uglier but less disruptive to the wire protocol fix would be for the server to interpret any properly encrypted data packet in a new epoch as being evidence that the ChangeCipherSpec message was lost. It does not need any information from it. That works unless in the encapsulated protocol the server was expected to speak first.

That paragraph also says that servers maintain a retransmission  timer and retransmit when that time expires. It notes that retransmission does not apply to HelloVerifyRequest messages. Retransmission is not required or helpful for any of the messages, but it is also harmless.

In sections 4.1.2.1 and 4.1.2.7, it says that invalid packets should normally be silently discarded but can alternately cause a fatal alert. I believe that it's worth noting that logging the discarded packets (or at least a count of them) is included in the definition of "silently discarding" and is often useful for diagnostic purposes.

Implementing sequence numbers correctly in the handshake protocol has some subtle requirements implied in the phrase "(at least notionally)" in the last paragraph of section 4.2.2. Section 4.2.1 says that there can be multiple round trips where a server keeps telling a client to use different cookies. The spec contains no upper bound on the number of exchanges there could be, but it also implies that each HelloVerifyRequest should have a new sequence number. Couple that with a stateless server, and the only way a correct implementation can work is for the server to accept *any* sequence number from a client for a ClientHello and use that as an initial sequence number for its responses. I don't know whether those semantics were intended. Either way the text should probably explain what to do or implementers are likely to do incompatible things.

Just as the cookie exchange was added to DTLS because TLS got that benefit by running over TCP, there is another problem which this protocol does not appear to address very well. If a connection is broken uncleanly (e.g. an endpoint crashes) and then someone attempts to create a new connection between the same IP addresses and UDP ports (e.g. an endpoint reboots), there appears to be no way in this protocol to distinguish a plaintext ClientHello from a malformed encrypted packet on the old connection. Since the best practice for a server is to silently discard malformed encrypted packets, when a client reboots and tries to reconnect it is likely that the server will appear dead. It would have been helpful if in the record header the Type field distinguished an encrypted handshake message from an unencrypted handshake message in order to identify this case. In the case of a fixed pair of UDP ports communicating, it would still be tricky to recover (since this could be confused with a DoS attack), but at least the server could figure out what was probably going on. Then it could implement some strategy like killing the DTLS connection if it had not received any messages in the last X minutes. This problem doesn't come up for TLS because the initialization of a TCP connection includes a SYN message that does not appear in the middle of a connection and random ISNs that prevent accidental aliasing of connections.

Nits/Typos:

Section 4.1 next to last line: "In an other case" -> "In any other case"
Section 4.1.2.7 formatting glitch (spacing) from copying text from one document to another