DNS Extensions Working Group                               W. Wijngaards
Internet-Draft                                                NLnet Labs
Intended status: Informational                           August 25, 2008
Expires: February 26, 2009


                       Resolver side mitigations
          draft-wijngaards-dnsext-resolver-side-mitigation-00

Status of This Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on February 26, 2009.

Abstract

   Describes a set of mitigations that stop the known Kaminsky
   variations, for which only resolver side deployment is necessary.













Wijngaards              Expires February 26, 2009               [Page 1]


Internet-Draft            Resolver mitigations               August 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3

   2.  Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . .  3

   3.  Mitigations  . . . . . . . . . . . . . . . . . . . . . . . . .  4
     3.1.  Add Entropy  . . . . . . . . . . . . . . . . . . . . . . .  4
     3.2.  Use Care with the Cache  . . . . . . . . . . . . . . . . .  5
     3.3.  Obtain Authoritative Data  . . . . . . . . . . . . . . . .  6
     3.4.  Detection  . . . . . . . . . . . . . . . . . . . . . . . .  7

   4.  Variants to Protect against  . . . . . . . . . . . . . . . . .  8

   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10

   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 11

   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 11

   8.  Informative References . . . . . . . . . . . . . . . . . . . . 11






























Wijngaards              Expires February 26, 2009               [Page 2]


Internet-Draft            Resolver mitigations               August 2008


1.  Introduction

   [WW: These are the counter measures for the Kaminsky attack scenarios
   that I envision for the Unbound resolver (http://unbound.net).  These
   are counter measures that require resolver side deployment only.
   Depending on working group input this document could remain an
   Unbound specific information document or can be made more generic,
   and move towards a BCP.]

   This document describes the mitigations that a resolver can deploy on
   its own in the meantime, while a more comprehensive (read: DNSSEC)
   solution is being rolled out.  For counter measures that require
   changes to authoritative and recursive servers everywhere, DNSSEC
   provides the most protection, followed by Nonce-based approaches
   (e.g.  EDNS PING), followed by transport protocol games.  Because
   Unbound implements DNSSEC validation already, and DNSSEC provides the
   most protection (e.g. against new unknown variations and also against
   full man-in-the-middle attacks), this is a good long term choice.

   The solutions covered in this document hope to cover all of the
   variations in the recent Kaminsky-style attacks.  However, it seems
   likely that other variations besides the ones described in this
   document are going to be discovered.  For that reason a number of
   generic protections are included, chief amongst those is the use of
   extra entropy.

   Since this document focuses on Unbound it is worth noting these are
   not all implemented, some of these are under consideration.  Unbound
   should support the mitigations considered 'best' by the community.
   This means without weird, ill-considered, mitigations of its own.
   Hence this document.

   It is assumed the reader is aware, and implementing, the [forgery-
   resilience] recommendations.

   In Section 2 the criteria are listed.  In Section 3 the various
   measures that can be used to mitigate threats are described.  Section
   4 enumerates Kaminsky-style attack variations, and shows what
   measures provide protection against each one of them.  Section 5
   discusses consequences caused by the mitigations.

2.  Criteria

   The first and foremost criterium is that these are resolver side
   solutions, thus only the resolver needs to be redeployed, or the
   software updated, for this to work.  The reason behind this is that a
   short term deployment is possible.  The idea is to provide some
   (partial) protection on the short term.  On the long term it is



Wijngaards              Expires February 26, 2009               [Page 3]


Internet-Draft            Resolver mitigations               August 2008


   possible to redeploy both authority and recursors, and the solution
   space is greatly increased (e.g. options range from EDNS PING, using
   TCP or SCTP, to DNSSEC deployment).

   Many solutions in this document could also be used in stub resolvers.
   Stub resolvers are not mentioned specifically further on, the main
   focus is on the caching recursive server.

   The solutions have to follow the DNS protocol.

   The solutions have to be non disruptive, and non anti-social.
   Specifically, they must not put the costs of the solution with 3rd
   parties.  For example, large scale fallback to TCP both uses a
   limited resource (TCP connections to authority servers), and disrupts
   deployment behind many middle boxes.

   Solutions without an 'attack mode' are preferred.  An 'attack mode'
   is a different state of behaviour that the resolver enters into after
   something anomalous is detected.  It may be for only a subset of
   operations or only a limited time.  One reason to avoid such modal
   design is that paranoia dictates that maximal protection should
   always be used.  A second reason is that if a protection measure
   cannot be used always, it is likely to be disruptive (see above).
   Such an 'attack mode' complicates implementation, testing and
   especially security analysis.

3.  Mitigations

   Below, the resolver side mitigations are described.

3.1.  Add Entropy

   The mitigations in this section increase the transaction entropy
   above the 16 bits in the ID number.  This is pretty close to the
   [forgery-resilience] text, differences are in the rtt banding text
   and 0x20 consideration.

   o  port randomisation

      As many as possible, using only 1000 or 2000 ports (as some
      commercial DNS products do) is not enough.  A range of 59000 port
      numbers (15.8 bits) can be usefully achieved.  This causes
      operational problems (NAT boxes using predictable port numbers),
      portability problems (bugs, features not available), and volume
      problems (using port number uses limited resource).

   o  0x20.




Wijngaards              Expires February 26, 2009               [Page 4]


Internet-Draft            Resolver mitigations               August 2008


      Breaks queries to some authorities, but more than 99.9% works.  It
      is like a proposal that needs authority server deployment where
      the authority servers are already deployed to a large extent.
      [0x20 proposal draft].

   o  rtt banding

      RTT banding refers to the method of picking a random nameserver
      for the query out of the set of nameservers that are within a RTT
      band (say at most 200 msec slower) from the fastest nameserver.

      New attack opportunities can be created by sending a new fake
      question to be resolved by the resolver.  Therefore the actual
      size of the roundtrip time window is not as important as the
      additional entropy gained by selecting randomly from a set of
      servers.

   o  IPv4 - IPv6

      When both IPv4 and IPv6 are available, the protocol can be chosen
      randomly together with rtt banding to provide more entropy.

   o  source address randomisation

      If the resolver has multiple public IP addresses these can be used
      to randomise with.

   If all the above entropy settings are in use, it is estimated that
   Unbound can provide about 44 bits of entropy (16 ID, 15.8 port bits,
   about 8 0x20 bits, about 2 rtt banding + protocol bits and about 2
   source address bits).  Without user configuration or queries amenable
   to 0x20, 34 bits of entropy are likely, or even 18 if a NAT box kills
   the port randomisation.  Entropy thus provides only limited
   protection.

3.2.  Use Care with the Cache

   o  rfc2181 adherence

      This means that RRsets are ranked in trustworthiness depending on
      whether they come from the answer section, or from another part of
      the message.  The authoritative answers are preferred.  [RFC2181]

      In addition, do not give data obtained from authority or
      additional sections in answer sections to clients.

   o  CNAME chain.




Wijngaards              Expires February 26, 2009               [Page 5]


Internet-Draft            Resolver mitigations               August 2008


      Only use first entry in answer section.  Perform new lookups for
      remainder.

   o  DNAME chain.

      Only use the first entry DNAME and its synthesized CNAME from the
      answer section.  Perform new lookups for remainder.

   o  no DNAME from cache

      Do not pick a DNAME RR out of the cache for a query for which that
      DNAME RR was not returned.  Thus, a DNAME is only used for query
      names for which answers have been received from the authority
      server.

      When the DNAME is signed with DNSSEC, it is allowed to synthesize
      new CNAMEs from it to answer new queries with it.  This is because
      the zone owner whose zone is redirected is signing away his own
      zone.

3.3.  Obtain Authoritative Data

   o  Authority query for NS after referral

      The idea is to obtain authoritative data for the NS RRset instead
      of using data tacked along on another message.  Care must be taken
      to avoid DoSing parent nameservers, and not break resolution in
      common cases where the NS RRsets in parent and child differ.

      On a referral, the data from the referral may be used to continue
      answering the current query, but it is not stored in the cache.
      If the question equals the referred zone name and has qtype NS,
      then the NS rrset from the referral does get stored in the cache.

      If the question is not that already, a new lookup is performed for
      the referred zone name with qtype NS.  The results from that
      lookup are cached normally.  The lookup has to start at a parent
      of the referred zone, so that a new referral is obtained.

      The upshot is that RFC2181 adherence pins the NS rrset data in the
      cache because it is seen in the answer section, and tacked on data
      from other messages is ignored until the TTL expires.  It should
      be noted that most infrastructure TTLs for NS records are very
      large.

      It does not break existing disjoint RRsets, or servers that do not
      answer for qtype NS at all, or servers that are offline, because
      the referral is cached when making the qtype NS query.  This is



Wijngaards              Expires February 26, 2009               [Page 6]


Internet-Draft            Resolver mitigations               August 2008


      why the qtype NS query has to be made in such a way that it
      elicits a fresh referral from the parent server.  This gives a
      once per TTL opportunity for spoofing the referral.

      The NS RRset answered from the child side of the zone cut
      overrides the NS RRset picked up from the referral.  This causes
      the same data to be used as today, where the authority section NS
      set sent along by the child server overrides the NS set seen from
      the referral.

      Additional queries are sent for this solution.  This increases
      resolver and authority server load and bandwith usage.

   o  Authority queries for nameserver addresses, A and AAAA.

      Same idea, like NS query above.  You ask for A or AAAA records
      directly at the authoritative server.  It is not necessary to
      elicit the referral again, the query can be directed at the best
      server.

      Additional queries are sent for this solution.  This increases
      resolver and authority server load and bandwith usage.

   A bonus when using the above methods to obtain authoritative data is
   that when using DNSSEC, the data can be validated, and thus spoofed
   infrastructure data can be detected and handled appropriately.  This
   protects DNSSEC, where the referral contains unsigned NS, A and AAAA
   records from spoofed infrastructure data.  Of course, DNSSEC is
   designed to protect end-user data anyway, whether or not the referral
   data was poisoned.  It simply adds the opportunity to add another
   layer of defense.

3.4.  Detection

   o  trouble counter

      This is a simple detection method.  It counts all packets that
      were not asked for.  The only thing noted about the packet is that
      it is a query reply (QR bit) and was not asked for.

      This may show false positives due to UDP packet duplicates,
      delayed responses (delayed for longer than the implementation
      cares to keep track of what it asks for).  The idea is that false
      positives are probably a low amount.  Conversely, some unasked for
      packets may not be noticed because the implementation may not be
      listening to particular ports, or whatever implementation choices.

      When a particular threshold is met, the cache is wiped clean.



Wijngaards              Expires February 26, 2009               [Page 7]


Internet-Draft            Resolver mitigations               August 2008


      The threshold is set so that denial of service does not become all
      that much easier, and that false positives do not (often) result
      in cache wipes.  A threshold in the range of 10 million is
      proposed.  This many packets itself is already a sizable denial of
      service attack, and also, the amount of data sent gets close to
      the cache size of the resolver to keep amplification towards the
      authority servers low.

      Since this mitigation is meant to protect against hiherto unknown
      variations, it does not help to examine the packets any further
      than the QR bit (and the fact that they were not used for regular
      processing).

      The result of this is that the probability that there is a
      poisoned item present in the cache is capped at some maximum.  The
      exact value depends on the entropy per message and the threshold.

4.  Variants to Protect against

   In the descriptions below a short title is given to quickly summarize
   the exploit.  The query 'q:' is what the attacker sends as fake
   question to the resolver to answer.  The answer, authority 'auth:'
   and additional 'add:' sections list the content that the spoofer
   provides.  The mitigation strategy, and sometimes discussion, is
   provided in the 'protected:' line.

   The real target is example.com or www.example.com or ns1.example.com,
   which is the real nameserver for example.com here.  The domain
   evil.example.net is under control of the attacker and
   192.0.2.66(evil) is an IP address under control of the attacker.  The
   label 'bad123' is used in place of a label that the attacker varies
   every attempt to obtain new spoofing windows.


   Glue with new DNS server
   q: bad123.example.com.
   answer: bad123.example.com. A whatever
   auth: example.com. NS evil.example.com.
   add: evil.example.com. A 192.0.2.66(evil)
   protected: 2181 adherence plus NS record pinned by NS query.
   Also name error or no data answers could be used, instead of
   this answer section.

   Glue for DNS server
   q: bad123.example.com.
   answer: bad123.example.com. A whatever
   auth: example.com. NS ns1.example.com. (normal entry)
   add: ns1.example.com. A 192.0.2.66(evil)



Wijngaards              Expires February 26, 2009               [Page 8]


Internet-Draft            Resolver mitigations               August 2008


   protected: 2181 adherence plus NS record pinned by NS query,
   plus A record pinned by glue query.
   Also name error or no data answers could be used, instead of
   this answer section.

   Glue for Web server
   q: bad123.example.com.
   answer: bad123.example.com. A whatever
   auth: example.com. NS www.example.com.
   add: www.example.com. A 192.0.2.66(evil)
   protected: 2181 adherence plus NS record pinned by NS query.

   Glue smaller
   q: bad123.example.com.
   answer: bad123.example.com. A 192.0.2.66(evil)
   auth: example.com. NS bad123.example.com.
   protected: 2181 adherence plus NS record pinned by NS query.

   NS change
   answer: bad123.example.com. A whatever
   auth: example.com. NS evil.example.net.
   protected: 2181 adherence plus NS record pinned by NS query.

   NS server migration
   answer: bad123.example.com. A whatever
   auth: example.com. NS ns1.example.com. (normal entry)
   auth: example.com. NS ns2.example.com.evil.example.net.
         (evil, looks like typo in server migration)
   protected: 2181 adherence plus NS record pinned by NS query.

   CNAME
   q: bad123.example.com.
   answer: bad123.example.com. CNAME www.example.com.
   answer: www.example.com. A 192.0.2.66(evil)
   protected: CNAME chain cutoff.

   DNAME one message
   q: www.bad123.example.com.
   answer: bad123.example.com. DNAME example.com.
   answer: www.bad123.example.com. CNAME www.example.com.
   answer: www.example.com. A 192.0.2.66(evil)
   protected: DNAME chain cutoff.

   DNAME whole zone
   q: bad123.example.com.
   answer: example.com. DNAME evil.example.net.
   answer: bad123.example.com. CNAME bad123.evil.example.net.
   answer: bad123.evil.example.net. A whatever



Wijngaards              Expires February 26, 2009               [Page 9]


Internet-Draft            Resolver mitigations               August 2008


   protected: no DNAME from cache.

   New Delegation - rigged
   q: bad123.www.example.com.
   answer: (empty)
   auth: www.example.com. NS www.example.com.
   add: www.example.com. A 192.0.2.66(evil)
   protected: the NS queries that ask referral confirmation
   together with glue queries.

   New Delegation - looks normal
   q: bad123.www.example.com.
   answer: (empty)
   auth: www.example.com. NS ns1.evil.example.net.
   auth: www.example.com. NS ns2.evil.example.net.
   protected: the NS queries that ask referral confirmation
   together with glue queries.

   New Delegation - for glue
   q: bad123.example.com.
   answer: (empty)
   auth: bad123.example.com. NS ns1.example.com.
   additional:  ns1.example.com. A 192.0.2.66(evil)
   protected: rfc2181 adherence.

   Another hiherto unknown variation
   These are a lot of variations and it is very likely that other
   people can come up with better, different ideas.
   protected: by entropy measures, by the count-and-wipe measure.
   Long term solutions (PING, TCP, DNSSEC) also aim to protect
   against these much more thoroughly.

5.  Security Considerations

   All of the mitigations aim to provide more security.  But, several of
   these mitigations have adverse effects on performance and bandwith.

   The CNAME, DNAME, NS and nameserver address mitigations all require
   that additional lookups be performed.  The CNAME and DNAME target
   lookups cause the answer to the client to be delayed.  The NS set and
   nameserver address lookups cause a higher load on both authority and
   resolver servers.

   The detection mechanism is susceptible to denial of service attacks.
   A small, calculated, amount of additional DoS leverage is provided.
   This changes some spoof attacks into a denial of service.

   The NS set and nameserver address lookups cause the NS, A and AAAA



Wijngaards              Expires February 26, 2009              [Page 10]


Internet-Draft            Resolver mitigations               August 2008


   rrsets to be pinned in the cache until the TTL expires.  This
   provides cache overwriting protection, but at the cost of not picking
   up updates to these RRsets in the course of normal resolution.
   Changes to these RRsets are then no longer seen on the next query,
   but only after the TTL times out.  This adversely affects the
   coherency of the DNS server infrastructure, as it becomes more likely
   that resolvers operate using out of date nameserver data.

6.  IANA Considerations

   None.

7.  Acknowledgments

   Thanks to Nicholas Weaver (ICSI Berkeley) and Olaf Kolkman (NLnet
   Labs).

8.  Informative References

   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, July 1997.

Author's Address

   Wouter Wijngaards
   NLnet Labs
   Kruislaan 419
   Amsterdam  1098 VA
   The Netherlands

   Phone: +31-20-888-4551
   EMail: wouter@nlnetlabs.nl



















Wijngaards              Expires February 26, 2009              [Page 11]


Internet-Draft            Resolver mitigations               August 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.












Wijngaards              Expires February 26, 2009              [Page 12]