Secret Key Transaction Authentication for DNS (TSIG)
Note: This ballot was opened for revision 07 and is now closed.
Warren Kumari Yes
(Deborah Brungard) No Objection
(Alissa Cooper) No Objection
Roman Danyliw No Objection
Comment (2020-03-10 for -07)
** Section 1.3. Per “In 2017, two nameservers strictly following that document (and the related [RFC4635]) were discovered to have security problems related to this feature”, consider providing a reference to the published vulnerabilities (i.e., CVE-2017-3142 and CVE-2017-3143) ** Section 6. Per “SHA-1 collisions have been demonstrated so the MD5 security considerations apply to SHA-1 in a similar manner. Although support for hmac-sha1 in TSIG is still mandatory for compatibility reasons, existing uses should be replaced with hmac-sha256 or other SHA-2 digest algorithms [FIPS180-4], [RFC3874], [RFC6234]. -- It’s worth repeating those MD5 security considerations here -- (from Magnus Nystrom’s SECDIR review, thanks Magnus!) it’s worth including references to the recent SHA-1 cryptoanalysis provided in the SECDIR review -- The SHA-2 family should be a normative SHOULD (or RECOMMENDED). ** Section 10. Per “For all of the message authentication code algorithms listed in this document, those producing longer values are believed to be stronger”, as noted in Magnus’s SECDIR review, this could be misconstrued as the algorithm choice not the digest length provides the security. Recommend rephrasing (or making some statement ** Editorial -- Section 4.3.2. Per “When verifying an incoming message, this is the message after the TSIG RR and been removed and the ARCOUNT field has been decremented.”, this sentence doesn’t parse (is missing a word). -- Section 4.3.2. Per “A whole and complete DNS message in wire format.”, this isn’t a sentence.
Benjamin Kaduk No Objection
Comment (2020-03-18 for -07)
Thanks for putting together this update; it's good to see the security issue getting closed off in the udpated spec, and progression to full Internet Standard! I do have several substantive comments (as well as some minor/nit-level ones), many of which are listed here at the top but a few of which are interspersed in the per-section comments. I considered making this a Discuss point, but it should be pretty uncontroversial and I trust that the right thing will happen even if I don't: there's a couple lingering remnants of SHA-1 being the preferred algorithm that need to be cleaned up, in Sections 184.108.40.206 and 10 (further detailed in the per-section comments). I also initially had made the following point a Discuss-level point, but decided to not do so since I don't remember any BCP-level guidance relating to cross-protocol attacks. Nevertheless, I strongly encourage the authors to consider that cryptographic best practice is to use any given key with exactly one cryptographic algorithm. The record format listed in Section 4.2 has the key name and algorithm as separately conveyed, which would allow for a given key to be used with all implemented algorithms. We should include some discussion that it's best to only use a single algorithm with any given key. We also have a 16-bit wide field for "Fudge", which (since it counts seconds) corresponds to a maximum window of something like +/- 18 hours; it's hard to believe that we really want to give people the rope to allow for that much time skew. (Yes, I understand that implementations will set something sane in practice but that doesn't necessarily mean that the protocol still has to allow it.) Our authoritative list of algorithm names (Table 1) is rather divorced from the references to be consulted for the individual hash algorithms to be used with the HMAC procedure. The ones used here are sufficiently well-known that I'm not terribly concerned about it, though. Abstract The title says "DNS" but maybe the body of the abstract should as well? Section 1.1 Some of this language feels like it might not age terribly well, e.g., "this can provide" or "[t]here was a need". addresses that need. The proposal is unsuitable for general server to server authentication for servers which speak with many other servers, since key management would become unwieldy with the number of shared keys going up quadratically. But it is suitable for many resolvers on hosts that only talk to a few recursive servers. Should zone transfers be mentioned here as well? Section 1.2 I don't understand the motivation for changing terminology from MACs to "signatures"; they're still MACs even though they're transaction-based. MAC of the query as part of the calculation. Where a response comprises multiple packets, the calculation of the MAC associated with the second and subsequent packets includes in its inputs the MAC for the preceding packet. In this way it is possible to detect any interruption in the packet sequence. I suggest mentioning the lack of mechanism to detect truncation of the packet sequence. Section 4.2 NAME The name of the key used, in domain name syntax. The name should reflect the names of the hosts and uniquely identify the key among a set of keys these two hosts may share at any given time. For example, if hosts A.site.example and B.example.net share a key, possibilities for the key name include <id>.A.site.example, <id>.B.example.net, and <id>.A.site.example.B.example.net. It should be possible for more than one key to be in simultaneous use among a set of interacting hosts. I'd suggest adding a note along the lines of "This allows for periodic key rotation per best operational practices, as well as algorithm agility as indicated by [BCP201]." The name may be used as a local index to the key involved and it is recommended that it be globally unique. Where a key is (nit?): this feels more like a "but" than an "and", to me. * MAC Size - an unsigned 16-bit integer giving the length of MAC field in octets. Truncation is indicated by a MAC size less than the size of the keyed hash produced by the algorithm specified by the Algorithm Name. nit: I would suggest "output size", as there are potentially a few different sizes involved (key size, block size, and output size, for starters, though I think the possibility of confusion here is low). * Other Len - an unsigned 16-bit integer specifying the length of the "Other Data" field in octets. * Other Data - this unsigned 48-bit integer field will be empty unless the content of the Error field is BADTIME, in which case it will contain the server's current time as the number of seconds since 00:00 on 1970-01-01 UTC, ignoring leap seconds (see Section 5.2.3). I'm slightly confused at the interplay between the explicit length field and the "empty unless" directive. Does this mean that the only allowed values in the "Other Len" are 0 and 6? Does "empty" mean "length-zero"? Section 4.3.1 Only included in the computation of a MAC for a response message (or the first message in a multi-message response), the validated request MAC MUST be included in the MAC computation. If the request MAC failed to validate, an unsigned error message MUST be returned instead. (Section 5.3.2). side note: while Section 5.3.2 specifies how to create an unsigned error message, it looks like Section 5.2 (and subsections lists specific RCODEs that are to be used. Section 4.3.2 When verifying an incoming message, this is the message after the TSIG RR and been removed and the ARCOUNT field has been decremented. If the message ID differs from the original message ID, the original message ID is substituted for the message ID. (This could happen, for example, when forwarding a dynamic update request.) I trust (based on this text having survived while going for full IS) that there are no interesting record-keeping considerations with respect to knowing the message ID(s) to substitute, in the "forwarding a dynamic-update request" case, presumably since this is just the field from the TSIG RDATA and not some externally retained state. Section 4.3.3 The RR RDLEN and RDATA MAC Length are not included in the input to MAC computation since they are not guaranteed to be knowable before the MAC is generated. I appreciate that this is explicitly noted; there are some security considerations regarding the non-inclusion of the (truncated) mac length as input, though. The local truncation policy helps here, but not 100%. For each label type, there must be a defined "Canonical wire format" Just to check my understanding: label types only come into play for the two fields that are in domain name syntax, key name and algorithm name? Section 5.1 the server. This TSIG record MUST be the only TSIG RR in the message and MUST be last record in the Additional Data section. The client (Is there anything else that tries to insist on being the last record in the additional data section? I guess it doesn't really make sense to try to Update: 1035 just to note this requirement.) MUST store the MAC and the key name from the request while awaiting an answer. [This is going to end up alongside the request-ID that it has to store already, right?] Note that some older name servers will not accept requests with a nonempty additional data section. Clients SHOULD only attempt signed transactions with servers who are known to support TSIG and share some algorithm and secret key with the client -- so, this is not a problem in practice. (The context in which this "SHOULD" appears makes it feel like it's repeating an admontion from elsewhere and not the only instance of the requirement, in which case a reference might be appropriate.) Section 5.2 If the TSIG RR cannot be understood, the server MUST regard the message as corrupt and return a FORMERR to the server. Otherwise the server is REQUIRED to return a TSIG RR in the response. As written, this could be read as an attempt to make a normative requirement of servers that do not implement this spec. Presumably it's just restating a requirement of the core protocol, though? Section 5.2.2 Using the information in the TSIG, the server should verify the MAC by doing its own calculation and comparing the result with the MAC received. If the MAC fails to verify, the server MUST generate an Is there any other way to verify the MAC? (Should this be a "MUST verify"?) Section 220.127.116.11 When space is at a premium and the strength of the full length of a MAC is not needed, it is reasonable to truncate the keyed hash and use the truncated value for authentication. HMAC SHA-1 truncated to 96 bits is an option available in several IETF protocols, including IPsec and TLS. Also Kerberos, where it was the strongest option for a while and we had to define a new encryption type to provide a better option (RFC 8009). This text seems to be implying that HMAC SHA-1 truncated to 96 bits is a recommendable option, which is ... far from clear. I'd prefer to have a note that this specific truncation was appropriate when initially specified but may not provide a security level appropriate for all cases in the modern environment, or preferrably to just change the reference to (e.g.) SHA-384-192 or SHA-256-128. This is sent when the signer has truncated the keyed hash output to an allowable length, as described in [RFC2104], taking initial octets and discarding trailing octets. TSIG truncation can only (Or when an on-path attacker has performed truncation.) Section 5.2.3 (BADTIME). The server SHOULD also cache the most recent time signed value in a message generated by a key, and SHOULD return BADTIME if a message received later has an earlier time signed value. A response (But there's no fudge factor on this check, other than the inherent limit of seconds granularity, as alluded to by the last paragraph of this section?) Section 5.3.1 A zone transfer over a DNS TCP session can include multiple DNS messages. Using TSIG on such a connection can protect the connection from hijacking and provide data integrity. The TSIG MUST be included (I assume that "hijacking TCP" means a sequence-number-guessing attack that would require the attacker to be winning the race against the legitimate sender to cause modified data to be introduced into the TCP stream? This is maybe not the best word to use for such a case.) on all DNS messages in the response. For backward compatibility, a client which receives DNS messages and verifies TSIG MUST accept up to 99 intermediary messages without a TSIG. The first message is (side note: I'm kind of curious what such compatibility is needed with. Also, this gives an attacker some flexibility into which to incorporate a collision, though given the near-real-time constraints and the strength of the HMAC construction I don't expect any practical impact.) Section 5.3.2 Request MAC (if the request MAC validated) DNS Message (response) TSIG Variables (response) The reason that the request is not included in this MAC in some cases is to make it possible for the client to verify the error. If the error is not a TSIG error the response MUST be generated as specified in Section 5.3. This makes it sound like the server excludes the request MAC from the digest if it failed to validate (something the client cannot predict), so that the client must perform trial verification of both versions in order to validate the response. Is that correct? Also, I think that the "in some cases" is not properly placed: as-is, it says that the request (not request MAC) is sometimes not included (implying that sometimes it is), which does not match up with the specification for the digest components. I presume that the intent is to say that in some cases the client could not verify the error, if the request itself was always included in the digest. Section 5.4.1 If an RCODE on a response is 9 (NOTAUTH), but the response TSIG validates and the TSIG key recognised by the client but different from that used on the request, then this is a Key Error. The client nits: missing words "key is recognized" and "but is different" Section 5.5 destination or the next forwarder. If no transaction security is available to the destination and the message is a query then, if the corresponding response has the AD flag (see [RFC4035]) set, the forwarder MUST clear the AD flag before adding the TSIG to the response and returning the result to the system from which it received the query. Is there anything to say about the CD bit? (It's independent crypto, so I don't expect so, but it seems worth checking.) Section 6 The only message digest algorithm specified in the first version of these specifications [RFC2845] was "HMAC-MD5" (see [RFC1321], [RFC2104]). Although a review of its security [RFC6151] concluded that "it may not be urgent to remove HMAC-MD5 from the existing protocols", with the availability of more secure alternatives the opportunity has been taken to make the implementation of this algorithm optional. It seems worth noting that the advice from RFC 6151 is already nine years old. [RFC4635] added mandatory support in TSIG for SHA-1 [FIPS180-4], [RFC3174]. SHA-1 collisions have been demonstrated so the MD5 security considerations apply to SHA-1 in a similar manner. Although I'd consider referencing (e.g.) shattered.io for the "collisions have been demonstrated" claim, though it's pretty optional. support for hmac-sha1 in TSIG is still mandatory for compatibility reasons, existing uses should be replaced with hmac-sha256 or other SHA-2 digest algorithms [FIPS180-4], [RFC3874], [RFC6234]. Is this "sha1 mandatory for compatibility" going to age well? If we have two implementations that can operate with sha2 variants, is it required to keep this as mandatory (vs. optional with a note about deployed reality at time of publication) for progression to Internet Standard? Optional hmac-sha224 It's not clear there's much reason to keep this around, or if we do, it could probably be "not recommended". (I assume that just dropping it entirely makes things annoying w.r.t. the IANA registry.) Section 9 Previous specifications [RFC2845] and [RFC4635] defined values for HMAC MD5 and SHA. IANA has also registered "gss-tsig" as an I'd suggest "HMAC-MD5 and HMAC-SHA-1", as the implied grouping where HMAC qualifies both hash algorithms is not terribly clear. Section 10 I suggest some discussion that the way truncation policy works, an attackers ability to forge messages accepted as valid is limited by the amount of truncation that the recipient is willing to accept, not the amount of truncation used to generate messages by the legitimate sender. There's also some fairly standard content to put in here about allowing for an unsigned error response to a signed request, so an attacker (even off-path) can spoof such resposnes. (Section 5.4 indicates that the client should continue to wait if it gets such a thing, which helps a lot.) TKEY [RFC2930]. Secrets SHOULD NOT be shared by more than two entities. I suggest adding "; any such additional sharing would allow any party knowing the key to impersonate any other such party to members of the group". A fudge value that is too large may leave the server open to replay attacks. A fudge value that is too small may cause failures if machines are not time synchronized or there are unexpected network delays. The RECOMMENDED value in most situations is 300 seconds. Our experience with kerberos in modern network environments has shown that 5 minutes is much larger than needed, though it's not clear there's a strong need to change this recommendation right now. Significant progress has been made recently in cryptanalysis of hash functions of the types used here. While the results so far should not affect HMAC, the stronger SHA-1 and SHA-256 algorithms are being made mandatory as a precaution. Please revise this note to not imply that SHA-1 is considered "strong". Section 11.2 I'm not sure why RFC 2104 is listed as informative.
(Mirja Kühlewind) No Objection
Comment (2020-03-11 for -07)
I only had limited time for a quick review of this document, so I might not be aware of all the needed background and details. Still I have two quick questions on retries: 1) Sec 5.2.3: "Implementations should be aware of this possibility and be prepared to deal with it, e.g. by retransmitting the rejected request with a new TSIG once outstanding requests have completed or the time given by their time signed plus fudge value has passed." I might not be aware of all protocol details and maybe this is already specified somewhere: While unlikely, it is possible that a request might be retransmitted multiple times, as that could cause president congestion over time, it's always good practice to define a maximum number of retransmissions. If that is already defined somewhere, maybe adding a note/pointer would be good as well. 2) Sec 5.3.1: " This allows the client to rapidly detect when the session has been altered; at which point it can close the connection and retry." When (immediately or after some waiting time) should the client retry and how often? You further say "The client SHOULD treat this the same way as they would any other interrupted transfer (although the exact behavior is not specified here)." Where is that specified? Can you provide a pointer in the text? 3) Sec 8. Shared Secrets: Would it be appropriate to use more normative language here? There are a bunch of lower case shoulds in this section; is that on purpose?
(Barry Leiba) No Objection
Comment (2020-03-09 for -07)
— Section 4.2 — * Other Len - an unsigned 16-bit integer specifying the length of the "Other Data" field in octets. * Other Data - this unsigned 48-bit integer field will be Does this mean that “other data” is always 48 bits? If so, does that mean tgat the value of “other len” is always 6? If so, then shouldn’t it say that? If not, then what don’t I understand? — Section 5.1 — Clients SHOULD only attempt signed transactions with servers who are known to support TSIG and share some algorithm and secret key with the client -- so, this is not a problem in practice. Why SHOULD and not MUST? — Section 5.3.2 — The server SHOULD also cache the most recent time signed value in a message generated by a key I tripped over this until I realized you mean “Time Signed value”. You capitalize it elsewhere, and it helps the parsing if it’s consistent. There are four uncapitalized instances in this section. — Section 9 — There is no structure required other than names for different algorithms must be unique when compared as DNS names, i.e., comparison is case insensitive. I found this sentence to be really awkward and hard to parse. May I suggest this?: NEW There is no structure to the names, and algorithm names are compared as if they were DNS names (the comparison is case-insensitive). END I don’t think you really need to say that each name is different/unique, right? other algorithm names are simple (i.e., single-component) names. Nitty thing that you can completely ignore, but I would avoid the Latin abbreviation thus: “other algorithm names are simple, single-component names.”
(Alexey Melnikov) No Objection
Alvaro Retana No Objection
(Adam Roach) No Objection
Éric Vyncke No Objection
Comment (2020-03-09 for -07)
Thank you for the work put into this document. It is clear and easy to read. Please find below some non-blocking COMMENTs and NITs. An answer will be appreciated. I hope that this helps to improve the document, Regards, -éric == COMMENTS == There are 6 authors while the usual procedure is to limit to 5 authors. Personally, I do not care. -- Section 1.3 -- It is a little unclear to me whether the "two nameservers" were two implementations or two actual DNS servers. -- Section 5.2 -- Suggest to provide some justifications about "copied to a safe location": the DNS message was sent in the clear, why does the TSIG part be copied in a safe location? Please define what is meant by "safe location" (Mainly for my own curiosity) "cannot be understood" is also quite vague. -- Section 5.3 -- About rejecting request with a time signed value being earlier than the last received value. I wonder what is the value of this behavior if there is no 'fudge' as well... The last paragraph of this section describes this case and push the error handling to the request initiator. Any reason why being flexible on the receiving site was not selected ? == NITS == -- Section 4.3.2 -- Is " A whole and complete DNS message in wire format." a complete and valid sentence?