Internet-Draft Considerations-Large-Auth-DNS-Ops November 2021
Moura, et al. Expires 12 May 2022 [Page]
Workgroup:
DNSOP Working Group
Internet-Draft:
draft-moura-dnsop-negative-cache-loop-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
G.C.M. Moura
SIDN Labs/TU Delft
W. Hardaker
USC/Information Sciences Institute
J. Heidemann
USC/Information Sciences Institute
S. Castro
IE Domain Registry

Negative Caching of Looping NS records

Abstract

This document updates guidance about detecting DNS loops in recursive resolver algorithms with new requirements to require recursive resolvers to detect loops and to implement negative caches.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 12 May 2022.

1. Introduction

Loops are a well-known configuration error in DNS zones. CNAME loops were first documented in [RFC1034], and can occur when two domains point to each other. For example:

.org zone file:

     example.org CNAME example.com

.com zone file:

    example.com CNAME example.org

[RFC1536] states that "a set of servers might form a loop wherein A refers to B and B refers to A". Although RFC1536 did not explicitly define other types of loops, others can also occur using NS records, as shown in the example below:

.org zone file:

    example.org NS  ns1.example.com

    example.org NS  ns2.example.com

.com zone file:

    example.com NS  ns1.example.org

example.com NS  ns2.example.org

In both the CNAME and NS loop cases, recursive resolvers will not be able to resolve these domain names, or any child domains underneath these zones.

1.1. Requirements notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Current Problem

Recent research[Moura21b] has shown how NS configuration loops can lead to significant increases in traffic: New Zealand's .nz country-code top-level domain (a ccTLD) experienced a 50% traffic surge when two domains were misconfigured with NS loops. Another anonymous European ccTLD saw its traffic grow by 10-fold when two subdomains were also miscofigured with NS loops. [Moura21b] also reproduced the experiments under multiple controlled scenarios.

If existing RFCs already provide solution for looping misconfiguration (Section 2), how come recent research [Moura21b] still showed that these loops exist in the wild and lead to such traffic surges?

3.1. Root Causes of Traffic Surge

[Moura21b] documents two main sources of amplification in the presence of NS loops:

  • Looping recursive resolvers: these are resolvers that send non-stop queries to authoritative servers after receiving a single client query (Figure 1) targeting a domain with an NS loop. Such recursive resolvers do not conform to the guidance in RFC1034 and RFC1035, both of which set limits to the number queries a resolver should send when resolving a name.
  • Looping clients, stub-resolvers, and forwarders: another situation occurs when parts of the DNS infrastructure, behind a recursive resolver, send non-stop queries in the presence of NS loops. These queries ultimately reach their upstream recursive resolvers, which then send queries to authoritative servers (and which themselves may further amplify the query stream).

To illustrate this, consider Figure 1. The Current RFCs provide solutions to prevent recursive resolvers from looping. Assume a client sends a query to its stub resolver, which they will forward to one of its locally configured recursive resovlers (Re1 and Re2). Assuming Re2 receives the query, it will then carry out the recursive recursion tasks. The current solutions limit the number of queries that Re2 will send to authoritative servers (AT2) when resolving the domain -- so the recursive resolver itself prevents looping. The recursive resolver should answer the client with a SERVFAIL error code in response.

However, this does not protect clients, stubs, or DNS forwarders (as Re1, which forwards to Re3) to start to repeatedly asking the same query. If, for example, Re2 sends up to 20 queries when resolving a domain name, every new incoming client query can trigger up to new 20 queries. This was exactly the problem the researchers found in Google Public DNS' implementation.

        +-----+  +-----+  +-----+ +-----+
        | AT1 |  | AT2 |  | AT3 |   | AT4 |
        +-----+  +-----+  +-----+ +-----+
          ^         ^             ^        ^
          |            |             |           |
          |       +-----+        |           |
          +------| Re3 |----+|           |
                  +-----+                    |
                      ^                        |
                      |                          |
                 +----+   +----+         |
                 |Re1 |   |Re2 |---------+
                 +----+   +----+
                   ^          ^
                    |           |
                    | +------+
                      +-|stub|
                     +------+
                          ^
                           |
                    | +--------+
                   +-| client |
                      +--------+
Figure 1: Relationship between clients, stub, recursive resolvers (Re) and authoritative name servers (ATn)

4. New requirement

Besides following the recommendations from RFC1034, RFC1035 and RFC2181 for handling loops, this memo requires that recursive resolvers MUST detect loop and MUST cache these records (negative caching)[RFC2308]. Recursive resolvers need to refrain from forwarding queries from clients/stub/forwarders to misconfigured domain names when a negative answer can be answered from its cache.

How long these loops should be cached for is an implementation choice; however, recursive results MUST answer from it's cache for at least 15 minutes, given that most looping NS/CNAME record situations will require human intervention.

7. Privacy Considerations

This document does not add any practical new privacy issues, aside from possible benefits in deploying longer TTLs which in turn requires less traffic to be sent and thus preserves privacy by query omission: longer TTLs may help preserve a user's privacy by reducing the number of requests that get transmitted in both the client-to-resolver and resolver-to-authoritative cases.

8. IANA considerations

This document has no IANA actions.

9. References

9.1. Normative References

[RFC1034]
Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, DOI 10.17487/RFC1034, , <https://www.rfc-editor.org/info/rfc1034>.
[RFC1035]
Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, , <https://www.rfc-editor.org/info/rfc1035>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC2308]
Andrews, M., "Negative Caching of DNS Queries (DNS NCACHE)", RFC 2308, DOI 10.17487/RFC2308, , <https://www.rfc-editor.org/info/rfc2308>.

9.2. Informative References

[Moura21b]
Moura, G.C.M., Castro, S., Heidemann, J., and W. Hardaker, "TsuNAME - exploiting misconfiguration and vulnerability to DDoS DNS", ACM 2021 Internet Measurement Conference, DOI 10.1145/3487552.3487824, , <https://www.isi.edu/%7ejohnh/PAPERS/Moura21b.pdf>.
[RFC1536]
Kumar, A., Postel, J., Neuman, C., Danzig, P., and S. Miller, "Common DNS Implementation Errors and Suggested Fixes", RFC 1536, DOI 10.17487/RFC1536, , <https://www.rfc-editor.org/info/rfc1536>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

Appendix B. Current implemenations

The requirements in this document have been implemented and deployed by:

  • Google Public DNS
  • Cisco OpenDNS

Authors' Addresses

Giovane C. M. Moura
SIDN Labs/TU Delft
Meander 501
6825 MD Arnhem
Netherlands
Wes Hardaker
USC/Information Sciences Institute
PO Box 382
Davis, 95617-0382
United States of America
John Heidemann
USC/Information Sciences Institute
4676 Admiralty Way
Marina Del Rey, 90292-6695
United States of America
Sebastian Castro
IE Domain Registry
2 Harbour Square, Dun Laoghaire
Dublin
A96 D6R0
Ireland