Skip to main content

Last Call Review of draft-ietf-dnsop-caching-resolution-failures-06
review-ietf-dnsop-caching-resolution-failures-06-genart-lc-pardue-2023-08-11-00

Request Review of draft-ietf-dnsop-caching-resolution-failures
Requested revision No specific revision (document currently at 08)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2023-08-17
Requested 2023-08-03
Authors Duane Wessels , William Carroll , Matthew Thomas
I-D last updated 2023-08-11
Completed reviews Genart Last Call review of -06 by Lucas Pardue (diff)
Dnsdir Last Call review of -06 by Peter van Dijk (diff)
Artart Last Call review of -06 by Barry Leiba (diff)
Dnsdir Telechat review of -07 by Peter van Dijk (diff)
Intdir Telechat review of -07 by Carlos Pignataro (diff)
Dnsdir Last Call review of -03 by Peter van Dijk (diff)
Assignment Reviewer Lucas Pardue
State Completed
Request Last Call review on draft-ietf-dnsop-caching-resolution-failures by General Area Review Team (Gen-ART) Assigned
Posted at https://mailarchive.ietf.org/arch/msg/gen-art/43z0mXI5dfGRYRTyD3lPD4HjiUE
Reviewed revision 06 (document currently at 08)
Result Ready w/issues
Completed 2023-08-11
review-ietf-dnsop-caching-resolution-failures-06-genart-lc-pardue-2023-08-11-00
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://wiki.ietf.org/en/group/gen/GenArtFAQ>.

Document: draft-ietf-dnsop-caching-resolution-failures-??
Reviewer: Lucas Pardue
Review Date: 2023-08-11
IETF LC End Date: 2023-08-17
IESG Telechat date: Not scheduled for a telechat

Summary: The document was well-written with clear motivation statements and
normative text for addressing the indicated problems

Major issues: None

Minor issues:

* Section 3.1 describes retries and places the normative requirement "A
resolver MUST NOT retry a given query to a server address over a given
transport protocol more than ...". However, the definition of "transport
protocol" is not 100% clear to me, and the terms "transport" and "transport
layer protocol" seem to be used interchangeably through the document.  Perhaps
this is clearer to those in the DNS area, but as a transport area person, DNS
over TCP and DNS over TLS both use the same transport protocol. Section 2.3
would seem to imply that DNS over TCP and DNS over TLS are treated as different.

I think it would help to better define exactly what "a given transport
protocol" in section 3.1 means. Perhaps that definition already exists
somewhere that can be cited and imported into the terminology section.

Nits/editorial comments:

* In section 1, there exists "section 5" and "section 7" usages that do make it
clear if these are internal or external references.

* I appreciated the text in sections 1.1 and 1.2, dealing with motivation and
related use cases respectively. However, as a generalist reviewer, the most
useful part of Section 1.1 was the first sentence. The remainder of the text in
1.1 feels like case studies, that while interesting manifestations, are not
pure motivation. As a purely editorial suggestion you can take or leave,
consider modifying the last paragraph of Section 1 to something like

"Operators of DNS services have known for some time that recursive resolvers
become more aggressive when they experience resolution failures; see Appendix A
for a collection of anecdotes, experiments, and incidents support this claim.
This document updates [RFC2308] to require negative caching of DNS resolution
failures, which can help to mitigate the operational problems failures might
generate. Examples of resolution failures are provided in Section 2. Related
work is described in Appendix B."

then move the text from sections 1.1 and 1.2 in appendix A and appendix B.

* TOC - "Conditions That Lead To DNS Resolution Failures" vs "Requirements for
Caching Resolution Failures". Presumably the same thing, so consistency might
help

* Section 3.2 - regarding the 1 second minimum requirement, the text that
follows says "Resolvers MAY cache different types of resolution failures for
different (i.e, longer) amounts of time." and then later "Consistent with
[RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes.".
These statements are all logically consistent but could be made simpler with
some editorial work. For example, something like

"Resolvers MUST cache resolution failures for at least 1 second. Resolvers MAY
cache failures for a longer time, up to a maximum of 5 minutes (per the
requirements of [RFC2308]). Resolvers MAY cache different types of failures
using different time periods within this range."