Skip to main content

Last Call Review of draft-ietf-dnsop-dns-error-reporting-06
review-ietf-dnsop-dns-error-reporting-06-dnsdir-lc-tale-2023-11-05-00

Request Review of draft-ietf-dnsop-dns-error-reporting
Requested revision No specific revision (document currently at 08)
Type Last Call Review
Team DNS Directorate (dnsdir)
Deadline 2023-10-30
Requested 2023-10-16
Authors Roy Arends , Matt Larson
I-D last updated 2023-11-05
Completed reviews Dnsdir Telechat review of -07 by James Gannon (diff)
Intdir Telechat review of -07 by Carlos Pignataro (diff)
Tsvart Telechat review of -07 by Gorry Fairhurst (diff)
Dnsdir Early review of -04 by James Gannon (diff)
Secdir Early review of -04 by Yaron Sheffer (diff)
Tsvart Last Call review of -06 by Gorry Fairhurst (diff)
Dnsdir Last Call review of -06 by David C Lawrence (diff)
Secdir Last Call review of -06 by Yaron Sheffer (diff)
Genart Last Call review of -06 by Peter E. Yee (diff)
Assignment Reviewer David C Lawrence
State Completed
Request Last Call review on draft-ietf-dnsop-dns-error-reporting by DNS Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/dnsdir/xi0ek6kQMGydCgHdcMAhwHNhE_k
Reviewed revision 06 (document currently at 08)
Result Ready w/issues
Completed 2023-11-05
review-ietf-dnsop-dns-error-reporting-06-dnsdir-lc-tale-2023-11-05-00
Hi Roy and Matt, this is my review on behalf of the DNS Directorate.

There are only a couple of minor substantive points that I think
really should be addressed before publication, and then my usual wordy
musings on a number of other nits.

Minor substantive points:

Section 4 doesn't really describe which responses should get the
Report-Channel option pushed.  Obviously it requires EDNS0, and I'd
guess I don't want to include it for unsigned responses. I'm also
guessing I'd not send it if DO were clear, and that I probably would
send it for any other DNSSEC-related query including for DNSSEC meta
records.

As an implementer, I'd prefer to not have to guess at those things and
rather be told when I should be including it.  Apparently in section 6.2.

Also, for Section 5, is 0 an okay OPTION-LENGTH or must it be minimum
1 with an AGENT-DOMAIN of \0?  Similarly, in 6.1 there is "returned an
empty agent domain", which is described more expansively in the
Overview as "empty, or is the null label" and suggests 6.1 should be
clearer on what constitutes an "empty" agent domain.

Bag o' Nits:

Hyphenate adjectival phrase:
s/stale DNSSEC signed zone/stale DNSSEC-signed zone/.

For the Terminology section, perhaps start with something like, "DNS
Terminology used in this document is from [BCP219], with these
additions:" Because, for example, the first item, "reporting
resolver", uses "validating recursive resolver" to define it.  (I'll
note, however, that BCP 219 defines "validating resolver" but not
"validating recursive resolver".  I'm not sure that "recursive" is a
strictly necessary qualifier here.) (Also, I don't remember what the
IETF style guidance is on reference by BCP# or by RFC#, and you did
include BCP219 as RFC8499 later on.)

s/wireformat/wire format/g per usage in BCP 219.

"The reporting resolver builds this QNAME by concatenating the _er
label" has _er unquoted here, but quoted later in the same sentence.
They should be consistent; I favour the quotes.  But also, the actual
algorithm is specified in 6.1.1, so restating it here without even
referencing that 6.1.1 is the authoritative definition is not ideal
and perhaps should just be "... builds this QNAME by the algorithm in
Section 6.1.1."

Relatedly, though 4.2 spells it out, as I was reading this paragraph I
wanted to see an example right away.  Because you said "label" as an
old DNS geek I could infer you meant "prepend _er with a dot" but I
did first have the thought flash through my mind about whether the dot
was there.

Maybe I'm odd that way, but scrolling down and back to satisfy the
visualization is slightly more cognitive work.  "See the example in
Section 4.2" could be something like.  "This results in a name like
_er.1.broken.test.7._er.a01.agent-domain.example., as constructed in
the example of section 4.2."

"The report query will ultimately arrive at the monitoring agent" is
kind of odd to me, not because it isn't true but because it seems like
a basic property of the DNS that we've been relying on for 40 years.
The first three sentences could be shortened up a bit to something
like, "The monitoring agent's reply to the report query MUST be cached
by the reporting resolver."  MUST describes it as essential, and the
remaining sentences describe why. 

I'd like to see some recommendations for a suitable TTL on the
reporting agent's reply.  Also, why no guidance on the TXT rdata?
Even something like "it could be as short as a null record to minimize
cache overhead, or could contain additional information that the
authority wishes to communicate to the resolver."

For "This QNAME indicates extended DNS error 7", append ", Signature
Expired, " as a useful appositive for the reader.

The last paragraph of section 4.2 seems overlong. The first
parenthetical is an unnecessary restatement of Terminology. "The agent
can determine" sentence basically restates the last sentence of the
prior paragraph. Finally "The monitoring agent can contact the
operators" sentence, while it does explicitly recognize that the
operator can be a different entity, reads a little odd for what I
would expect to be the most common case of monitor and operator being
the same entity.  It also isn't clear that it adds anything that isn't
intrinsically obvious, that the report could then be the trigger for
fixing the problem.  It looks to me like the whole last paragraph could
be dropped with no important loss of meaning.

"The DNS class is not specified in the error report."  Huh, so, yeah,
that's interesting.  I honestly hadn't even though about class until
this moment.  Maybe "SHOULD assume IN" is a worthwhile addition
somewhere, since that's nearly the entirety of DNS traffic, and the
only class for which DNSSEC is currently defined?  (I mean, if you
want to start signing CHAOS TXT and reporting on failures, we've got a
lot of other work to do.)

"The reporting resolver MUST NOT use DNS error reporting to report a
failure in resolving the report query." This feels ambigous to me,
because even as an old DNSSEC geek I would, in the vernacular,
describe a failure to validate as a failure to resolve.  Short example
phrases of what sort of thing you don't want to see happen would be
good.

6.1.1, "When the specified agent domain is empty, or a null label
(despite being not allowed in this specification), the report query
will have "_er" as a top-level domain as a result and not the original
query."  Is this sentence useful?  I can see it as a bit of a
rationale for why to have a second _er, but the first sentence already
provides sufficient rationale, and an empty/root agent domain already
means the rest of this section is irrelevant.

6.3 I'd put RECOMMENDED TTL range here, as just how often the agent
wants to keep hearing about failures for the same problem, and adding
consideration for just how many resolvers could be responding for how
many q-tuples.  Even just magnitude would help.  (Me?  In the absence
of any guidance I'd probably go with 15 minutes but could see anything
from 5 to 60 being pretty reasonable.)

s/Well known addresses/Well-known addresses/

Thanks for all your hard work on this draft!

--tale