Secret Key Transaction Authentication for DNS (TSIG)
draft-ietf-dnsop-rfc2845bis-09

Note: This ballot was opened for revision 07 and is now closed.

Warren Kumari Yes

(Deborah Brungard) No Objection

(Alissa Cooper) No Objection

Roman Danyliw No Objection

Comment (2020-03-10 for -07)
** Section 1.3.  Per “In 2017, two nameservers  strictly following that document (and the related [RFC4635]) were discovered to have security problems related to this feature”, consider providing a reference to the published vulnerabilities (i.e., CVE-2017-3142 and CVE-2017-3143)

** Section 6.  Per “SHA-1 collisions have been demonstrated so the MD5 security considerations apply to SHA-1 in a similar manner.  Although support for hmac-sha1 in TSIG is still mandatory for compatibility reasons, existing uses should be replaced with hmac-sha256 or other SHA-2 digest algorithms [FIPS180-4], [RFC3874], [RFC6234].

-- It’s worth repeating those MD5 security considerations here

-- (from Magnus Nystrom’s SECDIR review, thanks Magnus!) it’s worth including references to the recent SHA-1 cryptoanalysis provided in the SECDIR review

-- The SHA-2 family should be a normative SHOULD (or RECOMMENDED).

** Section 10.  Per “For all of the message authentication code algorithms listed in this document, those producing longer values are believed to be stronger”, as noted in Magnus’s SECDIR review, this could be misconstrued as the algorithm choice not the digest length provides the security.  Recommend rephrasing (or making some statement  

** Editorial
-- Section 4.3.2.  Per “When verifying an incoming message, this is the message after the TSIG RR and been removed and the ARCOUNT field has been decremented.”, this sentence doesn’t parse (is missing a word).

-- Section 4.3.2.  Per “A whole and complete DNS message in wire format.”, this isn’t a sentence.

Benjamin Kaduk No Objection

Comment (2020-03-18 for -07)
Thanks for putting together this update; it's good to see the security
issue getting closed off in the udpated spec, and progression to full
Internet Standard!  I do have several substantive comments (as well as
some minor/nit-level ones), many of which are listed here at the top but
a few of which are interspersed in the per-section comments.


I considered making this a Discuss point, but it should be pretty
uncontroversial and I trust that the right thing will happen even if I
don't: there's a couple lingering remnants of SHA-1 being the
preferred algorithm that need to be cleaned up, in Sections 5.2.2.1 and
10 (further detailed in the per-section comments).

I also initially had made the following point a Discuss-level point, but
decided to not do so since I don't remember any BCP-level guidance
relating to cross-protocol attacks.  Nevertheless, I strongly encourage
the authors to consider that cryptographic best practice is to use any
given key with exactly one cryptographic algorithm.  The record format
listed in Section 4.2 has the key name and algorithm as separately
conveyed, which would allow for a given key to be used with all
implemented algorithms.  We should include some discussion that it's
best to only use a single algorithm with any given key.

We also have a 16-bit wide field for "Fudge", which (since it counts
seconds) corresponds to a maximum window of something like +/- 18 hours;
it's hard to believe that we really want to give people the rope to
allow for that much time skew.  (Yes, I understand that implementations
will set something sane in practice but that doesn't necessarily mean
that the protocol still has to allow it.)


Our authoritative list of algorithm names (Table 1) is rather divorced
from the references to be consulted for the individual hash algorithms
to be used with the HMAC procedure.  The ones used here are sufficiently
well-known that I'm not terribly concerned about it, though.

Abstract

The title says "DNS" but maybe the body of the abstract should as well?

Section 1.1

Some of this language feels like it might not age terribly well, e.g.,
"this can provide" or "[t]here was a need".

   addresses that need.  The proposal is unsuitable for general server
   to server authentication for servers which speak with many other
   servers, since key management would become unwieldy with the number
   of shared keys going up quadratically.  But it is suitable for many
   resolvers on hosts that only talk to a few recursive servers.

Should zone transfers be mentioned here as well?

Section 1.2

I don't understand the motivation for changing terminology from MACs to
"signatures"; they're still MACs even though they're transaction-based.

   MAC of the query as part of the calculation.  Where a response
   comprises multiple packets, the calculation of the MAC associated
   with the second and subsequent packets includes in its inputs the MAC
   for the preceding packet.  In this way it is possible to detect any
   interruption in the packet sequence.

I suggest mentioning the lack of mechanism to detect truncation of the
packet sequence.

Section 4.2

   NAME  The name of the key used, in domain name syntax.  The name
         should reflect the names of the hosts and uniquely identify the
         key among a set of keys these two hosts may share at any given
         time.  For example, if hosts A.site.example and B.example.net
         share a key, possibilities for the key name include
         <id>.A.site.example, <id>.B.example.net, and
         <id>.A.site.example.B.example.net.  It should be possible for
         more than one key to be in simultaneous use among a set of
         interacting hosts.

I'd suggest adding a note along the lines of "This allows for periodic
key rotation per best operational practices, as well as algorithm
agility as indicated by [BCP201]."

         The name may be used as a local index to the key involved and
         it is recommended that it be globally unique.  Where a key is

(nit?): this feels more like a "but" than an "and", to me.

         *  MAC Size - an unsigned 16-bit integer giving the length of
            MAC field in octets.  Truncation is indicated by a MAC size
            less than the size of the keyed hash produced by the
            algorithm specified by the Algorithm Name.

nit: I would suggest "output size", as there are potentially a few
different sizes involved (key size, block size, and output size, for
starters, though I think the possibility of confusion here is low).

         *  Other Len - an unsigned 16-bit integer specifying the length
            of the "Other Data" field in octets.

         *  Other Data - this unsigned 48-bit integer field will be
            empty unless the content of the Error field is BADTIME, in
            which case it will contain the server's current time as the
            number of seconds since 00:00 on 1970-01-01 UTC, ignoring
            leap seconds (see Section 5.2.3).

I'm slightly confused at the interplay between the explicit length field
and the "empty unless" directive.  Does this mean that the only allowed
values in the "Other Len" are 0 and 6?  Does "empty" mean "length-zero"?

Section 4.3.1

   Only included in the computation of a MAC for a response message (or
   the first message in a multi-message response), the validated request
   MAC MUST be included in the MAC computation.  If the request MAC
   failed to validate, an unsigned error message MUST be returned
   instead.  (Section 5.3.2).

side note: while Section 5.3.2 specifies how to create an unsigned error
message, it looks like Section 5.2 (and subsections lists specific
RCODEs that are to be used.

Section 4.3.2

   When verifying an incoming message, this is the message after the
   TSIG RR and been removed and the ARCOUNT field has been decremented.
   If the message ID differs from the original message ID, the original
   message ID is substituted for the message ID.  (This could happen,
   for example, when forwarding a dynamic update request.)

I trust (based on this text having survived while going for full IS)
that there are no interesting record-keeping considerations with respect
to knowing the message ID(s) to substitute, in the "forwarding a
dynamic-update request" case, presumably since this is just the field
from the TSIG RDATA and not some externally retained state.

Section 4.3.3

   The RR RDLEN and RDATA MAC Length are not included in the input to
   MAC computation since they are not guaranteed to be knowable before
   the MAC is generated.

I appreciate that this is explicitly noted; there are some security
considerations regarding the non-inclusion of the (truncated) mac length
as input, though.  The local truncation policy helps here, but not 100%.

   For each label type, there must be a defined "Canonical wire format"

Just to check my understanding: label types only come into play for the
two fields that are in domain name syntax, key name and algorithm name?

Section 5.1

   the server.  This TSIG record MUST be the only TSIG RR in the message
   and MUST be last record in the Additional Data section.  The client

(Is there anything else that tries to insist on being the last record in
the additional data section?  I guess it doesn't really make sense to
try to Update: 1035 just to note this requirement.)

   MUST store the MAC and the key name from the request while awaiting
   an answer.

[This is going to end up alongside the request-ID that it has to store
already, right?]

   Note that some older name servers will not accept requests with a
   nonempty additional data section.  Clients SHOULD only attempt signed
   transactions with servers who are known to support TSIG and share
   some algorithm and secret key with the client -- so, this is not a
   problem in practice.

(The context in which this "SHOULD" appears makes it feel like it's
repeating an admontion from elsewhere and not the only instance of the
requirement, in which case a reference might be appropriate.)

Section 5.2

   If the TSIG RR cannot be understood, the server MUST regard the
   message as corrupt and return a FORMERR to the server.  Otherwise the
   server is REQUIRED to return a TSIG RR in the response.

As written, this could be read as an attempt to make a normative
requirement of servers that do not implement this spec.  Presumably it's
just restating a requirement of the core protocol, though?

Section 5.2.2

   Using the information in the TSIG, the server should verify the MAC
   by doing its own calculation and comparing the result with the MAC
   received.  If the MAC fails to verify, the server MUST generate an

Is there any other way to verify the MAC?  (Should this be a "MUST
verify"?)

Section 5.2.2.1

   When space is at a premium and the strength of the full length of a
   MAC is not needed, it is reasonable to truncate the keyed hash and
   use the truncated value for authentication.  HMAC SHA-1 truncated to
   96 bits is an option available in several IETF protocols, including
   IPsec and TLS.

Also Kerberos, where it was the strongest option for a while and we had
to define a new encryption type to provide a better option (RFC 8009).

This text seems to be implying that HMAC SHA-1 truncated to 96 bits is a
recommendable option, which is ... far from clear.  I'd prefer to have a
note that this specific truncation was appropriate when initially
specified but may not provide a security level appropriate for all cases
in the modern environment, or preferrably to just change the reference
to (e.g.) SHA-384-192 or SHA-256-128.

       This is sent when the signer has truncated the keyed hash output
       to an allowable length, as described in [RFC2104], taking initial
       octets and discarding trailing octets.  TSIG truncation can only

(Or when an on-path attacker has performed truncation.)

Section 5.2.3

   (BADTIME).  The server SHOULD also cache the most recent time signed
   value in a message generated by a key, and SHOULD return BADTIME if a
   message received later has an earlier time signed value.  A response

(But there's no fudge factor on this check, other than the inherent
limit of seconds granularity, as alluded to by the last paragraph of
this section?)

Section 5.3.1

   A zone transfer over a DNS TCP session can include multiple DNS
   messages.  Using TSIG on such a connection can protect the connection
   from hijacking and provide data integrity.  The TSIG MUST be included

(I assume that "hijacking TCP" means a sequence-number-guessing attack
that would require the attacker to be winning the race against the
legitimate sender to cause modified data to be introduced into the TCP
stream?  This is maybe not the best word to use for such a case.)

   on all DNS messages in the response.  For backward compatibility, a
   client which receives DNS messages and verifies TSIG MUST accept up
   to 99 intermediary messages without a TSIG.  The first message is

(side note: I'm kind of curious what such compatibility is needed with.
Also, this gives an attacker some flexibility into which to incorporate
a collision, though given the near-real-time constraints and the
strength of the HMAC construction I don't expect any practical impact.)

Section 5.3.2

      Request MAC (if the request MAC validated)
      DNS Message (response)
      TSIG Variables (response)

   The reason that the request is not included in this MAC in some cases
   is to make it possible for the client to verify the error.  If the
   error is not a TSIG error the response MUST be generated as specified
   in Section 5.3.

This makes it sound like the server excludes the request MAC from the
digest if it failed to validate (something the client cannot predict),
so that the client must perform trial verification of both versions in
order to validate the response.  Is that correct?

Also, I think that the "in some cases" is not properly placed: as-is, it
says that the request (not request MAC) is sometimes not included
(implying that sometimes it is), which does not match up with the
specification for the digest components.  I presume that the intent is
to say that in some cases the client could not verify the error, if the
request itself was always included in the digest.

Section 5.4.1

   If an RCODE on a response is 9 (NOTAUTH), but the response TSIG
   validates and the TSIG key recognised by the client but different
   from that used on the request, then this is a Key Error.  The client

nits: missing words "key is recognized" and "but is different"

Section 5.5

   destination or the next forwarder.  If no transaction security is
   available to the destination and the message is a query then, if the
   corresponding response has the AD flag (see [RFC4035]) set, the
   forwarder MUST clear the AD flag before adding the TSIG to the
   response and returning the result to the system from which it
   received the query.

Is there anything to say about the CD bit?  (It's independent crypto, so
I don't expect so, but it seems worth checking.)

Section 6

   The only message digest algorithm specified in the first version of
   these specifications [RFC2845] was "HMAC-MD5" (see [RFC1321],
   [RFC2104]).  Although a review of its security [RFC6151] concluded
   that "it may not be urgent to remove HMAC-MD5 from the existing
   protocols", with the availability of more secure alternatives the
   opportunity has been taken to make the implementation of this
   algorithm optional.

It seems worth noting that the advice from RFC 6151 is already nine
years old.

   [RFC4635] added mandatory support in TSIG for SHA-1 [FIPS180-4],
   [RFC3174].  SHA-1 collisions have been demonstrated so the MD5
   security considerations apply to SHA-1 in a similar manner.  Although

I'd consider referencing (e.g.) shattered.io for the "collisions have
been demonstrated" claim, though it's pretty optional.

   support for hmac-sha1 in TSIG is still mandatory for compatibility
   reasons, existing uses should be replaced with hmac-sha256 or other
   SHA-2 digest algorithms [FIPS180-4], [RFC3874], [RFC6234].

Is this "sha1 mandatory for compatibility" going to age well?  If we
have two implementations that can operate with sha2 variants, is it
required to keep this as mandatory (vs. optional with a note about
deployed reality at time of publication) for progression to Internet
Standard?

                   Optional    hmac-sha224

It's not clear there's much reason to keep this around, or if we do, it
could probably be "not recommended".  (I assume that just dropping it
entirely makes things annoying w.r.t. the IANA registry.)

Section 9

   Previous specifications [RFC2845] and [RFC4635] defined values for
   HMAC MD5 and SHA.  IANA has also registered "gss-tsig" as an

I'd suggest "HMAC-MD5 and HMAC-SHA-1", as the implied grouping where
HMAC qualifies both hash algorithms is not terribly clear.

Section 10

I suggest some discussion that the way truncation policy works, an
attackers ability to forge messages accepted as valid is limited by the
amount of truncation that the recipient is willing to accept, not the
amount of truncation used to generate messages by the legitimate sender.

There's also some fairly standard content to put in here about allowing
for an unsigned error response to a signed request, so an attacker (even
off-path) can spoof such resposnes.  (Section 5.4 indicates that the
client should continue to wait if it gets such a thing, which helps a
lot.)

   TKEY [RFC2930].  Secrets SHOULD NOT be shared by more than two
   entities.

I suggest adding "; any such additional sharing would allow any party
knowing the key to impersonate any other such party to members of the
group".

   A fudge value that is too large may leave the server open to replay
   attacks.  A fudge value that is too small may cause failures if
   machines are not time synchronized or there are unexpected network
   delays.  The RECOMMENDED value in most situations is 300 seconds.

Our experience with kerberos in modern network environments has shown
that 5 minutes is much larger than needed, though it's not clear there's
a strong need to change this recommendation right now.

   Significant progress has been made recently in cryptanalysis of hash
   functions of the types used here.  While the results so far should
   not affect HMAC, the stronger SHA-1 and SHA-256 algorithms are being
   made mandatory as a precaution.

Please revise this note to not imply that SHA-1 is considered "strong".

Section 11.2

I'm not sure why RFC 2104 is listed as informative.

(Mirja Kühlewind) No Objection

Comment (2020-03-11 for -07)
I only had limited time for a quick review of this document, so I might not be aware of all the needed background and details. Still I have two quick questions on retries:

1) Sec 5.2.3:
"Implementations should be aware
   of this possibility and be prepared to deal with it, e.g. by
   retransmitting the rejected request with a new TSIG once outstanding
   requests have completed or the time given by their time signed plus
   fudge value has passed."
I might not be aware of all protocol details and maybe this is already specified somewhere: While unlikely, it is possible that a request might be retransmitted multiple times, as that could cause president congestion over time, it's always good practice to define a maximum number of retransmissions. If that is already defined somewhere, maybe adding a note/pointer would be good as well.

2) Sec 5.3.1:
"   This allows the client to rapidly detect when the session has been
   altered; at which point it can close the connection and retry."
When (immediately or after some waiting time) should the client retry and how often?
You further say 
"The client SHOULD treat this the
   same way as they would any other interrupted transfer (although the
   exact behavior is not specified here)."
Where is that specified? Can you provide a pointer in the text?

3) Sec 8.  Shared Secrets: Would it be appropriate to use more normative language here? There are a bunch of lower case shoulds in this section; is that on purpose?

(Barry Leiba) No Objection

Comment (2020-03-09 for -07)
— Section 4.2 —

         *  Other Len - an unsigned 16-bit integer specifying the length
            of the "Other Data" field in octets.
         *  Other Data - this unsigned 48-bit integer field will be

Does this mean that “other data” is always 48 bits?  If so, does that mean tgat the value of “other len” is always 6?  If so, then shouldn’t it say that?  If not, then what don’t I understand?

— Section 5.1 —

   Clients SHOULD only attempt signed
   transactions with servers who are known to support TSIG and share
   some algorithm and secret key with the client -- so, this is not a
   problem in practice.

Why SHOULD and not MUST?

— Section 5.3.2 —

   The server SHOULD also cache the most recent time signed
   value in a message generated by a key

I tripped over this until I realized you mean “Time Signed value”.  You capitalize it elsewhere, and it helps the parsing if it’s consistent. There are four uncapitalized instances in this section.

— Section 9 —

   There is no structure
   required other than names for different algorithms must be unique
   when compared as DNS names, i.e., comparison is case insensitive.

I found this sentence to be really awkward and hard to parse.  May I suggest this?:

NEW
There is no structure to the names, and algorithm names are compared as if they were DNS names (the comparison is case-insensitive).
END

I don’t think you really need to say that each name is different/unique, right?

   other algorithm
   names are simple (i.e., single-component) names.

Nitty thing that you can completely ignore, but I would avoid the Latin abbreviation thus: “other algorithm names are simple, single-component names.”

(Alexey Melnikov) No Objection

Alvaro Retana No Objection

(Adam Roach) No Objection

Éric Vyncke No Objection

Comment (2020-03-09 for -07)
Thank you for the work put into this document. It is clear and easy to read.

Please find below some non-blocking COMMENTs and NITs. An answer will be appreciated.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

There are 6 authors while the usual procedure is to limit to 5 authors. Personally, I do not care.

-- Section 1.3 --
It is a little unclear to me whether the "two nameservers" were two implementations or two actual DNS servers.

-- Section 5.2 --
Suggest to provide some justifications about "copied to a safe location": the DNS message was sent in the clear, why does the TSIG part be copied in a safe location? Please define what is meant by "safe location" (Mainly for my own curiosity)

"cannot be understood" is also quite vague.

-- Section 5.3 --
About rejecting request with a time signed value being earlier than the last received value. I wonder what is the value of this behavior if there is no 'fudge' as well... The last paragraph of this section describes this case and push the error handling to the request initiator. Any reason why being flexible on the receiving site was not selected ?


== NITS ==

-- Section 4.3.2 --
Is " A whole and complete DNS message in wire format." a complete and valid sentence?