A privacy analysis on DoH deployment
draft-mglt-abcd-doh-privacy-analysis-00

Document Type Active Internet-Draft (individual)
Last updated 2019-11-04
Stream (None)
Intended RFC status (None)
Formats plain text xml pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
dnsop                                                         D. Migault
Internet-Draft                                                  Ericsson
Intended status: Informational                         November 04, 2019
Expires: May 7, 2020

                  A privacy analysis on DoH deployment
                draft-mglt-abcd-doh-privacy-analysis-00

Abstract

   This document provides an analysis on DoH impact on privacy

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 7, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Migault                    Expires May 7, 2020                  [Page 1]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

Table of Contents

   1.  Requirements Notation . . . . . . . . . . . . . . . . . . . .   2
   2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   3.  DNS traffic and privacy . . . . . . . . . . . . . . . . . . .   3
   4.  Privacy impact of DoH . . . . . . . . . . . . . . . . . . . .   4
     4.1.  DNS systems polices:  lost of control versus independence   6
   5.  Privacy impact related to the choice of the  DNS resolver . .   7
   6.  Privacy impact of concentration . . . . . . . . . . . . . . .   8
     6.1.  Acknowledgment  . . . . . . . . . . . . . . . . . . . . .  10
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
     7.2.  Informative References  . . . . . . . . . . . . . . . . .  11
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described BCP 14
   [RFC2119] [RFC8174] when, and only when, they appear in all capitals,
   as shown here.

2.  Introduction

   DNS Queries over HTTPS (DoH) [RFC8484] differs from the traditional
   DNS [RFC1035] in that DNS exchanges between the DNS client and the
   resolver are now encrypted and that DNS traffic is not signaled as
   DNS traffic (with port 53) but instead uses (port 443).

   Such approach could enhance end user's privacy by preventing any on-
   path party to infer any DNS related information from the observed
   traffic.  However, such enhancement may also have counter effects
   such as the loose of control of the DNS traffic by the end user
   itself.

   This draft aims at providing an analysis on the impact of the
   deployment of DoH on the current internet.

   Section Section 3 details privacy sensitive information carried by
   the DNS traffic and evaluate how specific this information is
   specific to DNS or could be inferred from other traffic such as the
   web traffic depending on Internet concentration.

   Section Section 4 exposes the privacy implication of possible usage
   of DoH and more precisely the ability to circumvent or enforce the
   end user policies.

Migault                    Expires May 7, 2020                  [Page 2]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   While encrypting the DNS traffic enables the section of a DNS
   resolver, section Section 5 exposes the privacy implications
   associated to the selection of a resolver and show that choosing a
   resolver outside the boundaries of an ISP provides in fact limited
   protection toward that ISP.

   Finally, section Section 6 shows that despite the advantages that
   concentration could provide by obfuscating the IP address, the
   overall picture of concentration shows that it represents a threat to
   the end user's privacy.

3.  DNS traffic and privacy

   DNS data are public data available to everyone.  As a result, the
   value associated to a DNS exchange are mostly carried by the DNS
   request that answers to "What this specific end users is interested
   in?" or "Which end users contact this site?" rather the DNS
   information provided by the response.  Such information is carried by
   associating the destination IP address (of the IP header) as well as
   the DNS query field.  There are good and bad reasons for monitoring
   the sites as well as end user that connects to them.  Typically, a
   network administrator may prevent the end user to connect to
   malicious web sites as well as monitor the sites the end user is
   connecting to.  However, it is out of the control of any protocol to
   impact the usage of this information.

   In most cases, the DNS exchanges are followed by a web connection.
   If the web session were not encrypted, observation of the web traffic
   would provide the same information as those carried in the DNS
   traffic.  This information would be richer and more accurate, as web
   traffic really reflects the web sites the end user is accessing.
   However, the amount of web traffic is huge compared to the DNS
   traffic and the DNS traffic was clearly distinct from the HTTP
   traffic with different port from HTTP(S) with a distinct termination
   point (DNS resolver).

   With an increasing number of encrypted web traffic, analysis of the
   HTTP traffic is not anymore possible as it is being protected by TLS.
   However, HTTPS traffic still reveals the destination IP address and
   the domain name within a TLS field designated as SNI.  As mentioned
   earlier, analysis of the HTTPS traffic, due to the volumes invoked
   remains a challenge in itself.  However, the encryption of the SNI as
   well as the fact that one IP address provided by cloud provider can
   be shared by multiple web sites clearly limit the meaning of the
   information provided by a supposed analysis of the HTTPS traffic.  As
   a result, in addition to be more convenient, the information
   associated revealed by the DNS traffic may not be inferred from other
   traffic.

Migault                    Expires May 7, 2020                  [Page 3]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   As a result, the information carried by the DNS traffic has the
   following characteristics:

   o  The DNS traffic is a good representation of the web traffic of one
      end user.

   o  When not carried over HTTP, the DNS traffic is by construction
      logically separate from the web traffic.

   o  The DNS traffic is terminated in one point, while web traffic is
      generally terminated at multiple destinations

   The privacy sensitive information carried by the DNS traffic are the
   IP addresses that "identify" the end user and the content of the DNS
   query, that reflects the activity of the end user.  This information
   is limited to the administrative domain the DNS traffic is steered to
   when the DNS traffic is not encrypted.  When the DNS traffic is
   encrypted, this information is limited to the two end points, that is
   the end user and the DNS resolver.

   The same activity can be inferred from the encrypted web traffic
   unless ESNI together with a high concentration of web sites behind a
   limited number of IP addresses.  In that sense web site concentration
   and ESNI adds boundaries to the information associated to the DNS
   traffic, which could enhance the privacy against on-path monitoring.
   However, concentration of the web traffic transfers the information
   from the internet providers to large cloud providers.
   Section Section 6 details furthermore how concentration represents a
   direct threat to privacy.

   As a result privacy sensitive information carried by the DNS traffic
   is shared between the DNS client, the DNS resolver via DNS traffic.
   Similar information is provided by the web traffic that is shared
   between the HTTP client as well as the internet service provider and
   major cloud providers.  The balance between these two depends on the
   level of concentration.

4.  Privacy impact of DoH

   The use of DoH to perform DNS exchanges has the following impacts on
   the DNS traffic:

   o  DNS traffic is encrypted

   o  DNS traffic is no different from the encrypted web traffic

   As mentioned in section Section 3, since DNS traffic is encrypted,
   the privacy sensitive information of the DNS is exchanged between the

Migault                    Expires May 7, 2020                  [Page 4]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   DNS client and the DNS resolver.  As per the Internet threat model of
   [RFC3552], it is expected that "the end-systems engaging in a
   protocol exchange have not themselves been compromised.  Protecting
   against an attack when one of the end-systems has been compromised is
   extraordinarily difficult.".  The purpose of the protection is to
   protect against an attacker that may have a complete control of the
   network.  With that threat model in mind, encryption protects the DNS
   exchanged via DNS exchange between the DNS client and the DNS
   resolver and as such improves the end user's privacy.  In particular
   it protects against pervasive monitoring attacks [RFC7258].

   However, as mentioned in [RFC6973] privacy analysis needs to question
   the assumption of [RFC3552] on end-systems "since systems are often
   compromised for the purpose of obtaining personal data".  In
   addition, privacy also includes the ability of the end user to
   control and protect its information.

   The ability to enforce policies for the DNS traffic has been
   performed until today by having the DNS client centralized in the
   system of the end user.  The configuration at the operating system
   level ensures that all applications were aligned with the end user
   policy.  A typical policy typically includes the domains that needs
   to be resolved, the interface to be used, the DNS resolver to
   contact...

   DoH changes this paradigm in the way that an application can
   circumvent the policy set by the end user, without the end user being
   aware of it.  Firstly, the encryption is performed by the application
   and as such does not provide any visibility to the operating system.
   Second, the use of HTTPS makes DNS traffic indistinguishable from the
   web traffic.  To that extend, DoT would signal the system that some
   encrypted DNS traffic is being handled by the application.  The end
   user may accept or refuse such traffic depending on its policy.  DoH
   does not provides such capabilities.

   Another way to see this issue is to consider that the communication
   between the DNS client and the DNS resolver is a communication that
   is secured between the two application end-point.  The end resolver
   policy enforcement is performed on-path inside the end user system,
   but encryption prevents it to be enforced.

   In a nutshell, DoH encrypts and makes DNS traffic undetectable.  This
   provides the ability for an application to circumvent the policies
   defined by the system and can be seen as a loose of control.  The
   alignment with the policies of the system is enforced by explicit
   policies from the application and trust the application enforces the
   claimed policies.

Migault                    Expires May 7, 2020                  [Page 5]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   The impact on privacy needs to balance the DNS policies provided by
   the system versus those provided by the application and more
   explicitly which of these policies better protects the end user.

4.1.  DNS systems polices: lost of control versus independence

   The DNS system policies may or may not reflects the end user's
   preferences, however, these are part of the configuration parameters
   of the system and the end user can at least be aware of the policies
   of its system.

   There are cases were the DNS policies in the system expresses the end
   user's policies.  This includes typically the choice of a specific
   DNS resolver, the subscription to parental control.  For such end
   user, the ability that an application circumvents the policies of the
   system represents a threat to their ability to control their DNS
   traffic.

   Similarly, there are cases were the DNS policies are not explicitly
   specified by the end user, but there is a agreement of the end user
   to have these policies.  This typically includes corporate users that
   have agreed to comply with the corporate policies with potentially
   some web sites cannot be accessed.  For these end users, the ability
   that an application can circumvent the DNS policies of the company
   exposes the end user to risks he may not want to take.

   For the two latest category of users the ability for each application
   to have specific DNS policies present the following drawbacks:

   1) A per-application control results in defining at multiple places
   the DNS policies.  This at least can create some confusions to the
   end user, makes configuration prone to errors and eventually
   debugging harder.

   2) While some applications may have clear and explicit DNS policies,
   that the end user could in principle check or configure against the
   policies he is enforcing, these policies are subject to change over
   time and without notice, typically during updates.  While constantly
   checking the policies is not something we can rely on, the end user
   or company may delay the applications to be updated which adds an
   additional risk to the end user privacy.

   There are cases where the DNS policies are imposed to the end user
   against its will and without agreement from his side.  Motivations
   for such policies could be to enforce surveillance of the end user.
   In such situation the ability to circumvent the DNS policies by an
   application improves the end user's privacy.  It is also safer that

Migault                    Expires May 7, 2020                  [Page 6]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   DNS policies are enforced by the application as the application will
   be in these situation the trusted system of the end user.

   As a result, that an application can enforce there own policies
   improves or reduce the control of the DNS traffic of the end user
   depends on what the trust system of the end user is.  If the trust
   system of the end user is the application, this ability clearly
   improves, otherwise, this may represent a threat.  In the later case,
   applications should follow the configuration of the system.

5.  Privacy impact related to the choice of the DNS resolver

   As mentioned in section Section 3 DoH provides end-to-end encryption
   and as such provides the ability for the end user to chose a specific
   DNS resolver and share the DNS data only with that resolver.  One
   motivation to chose a specific DNS resolver is to move to a DNS
   resolver that considers the end user's privacy with more attention.
   This includes, among other things, not profiling the end users, not
   selling user's information, or in some places not tracking specific
   end users.  In that sense, the ability for a end user to chose a DNS
   resolver represents major improvement.  When the DNS resolvers are
   not on-path, and the end user changes from one DNS resolver to the
   other, encryption does not provide additional protection.  In fact,
   encryption is clearly aiming at protecting against an attacker that
   would be on-path.

   On the other hand, as mentioned in section Section 3, web traffic,
   unless using more advance IP routing such as with TOR, also leaks
   similar information.  Though gathering the information from the web
   traffic instead of the DNS traffic raises the bar, it represents a
   major improvement only if the bar remains sufficiently high.
   Unfortunately, the bar is nonexistent for user tracking, and remains
   weak to generalize the tracking to all users.  As such the decision
   is more on the network side to decide the value associated to the DNS
   traffic or legal requirements to put the necessary infrastructure in
   place for it.  It does not seem to be entirely in the end user's
   hand.  As result, the encryption provides limited protection against
   on-path parties - such as an ISP.  Unless combined wit TOR, moving to
   a DNS resolver that is not managed by the ISP does not hide much
   information to the ISP.

   The remaining of this section considers that DNS information is not
   inferred from the web traffic and analyses how moving from a DNS
   resolver hosted in the ISP network to an DNS resolver outside the ISP
   network impacts the end user's privacy.  The perspective is
   considering the information shared with the DNS resolver, and
   motivations such as bad privacy protection, better latency, DNSSEC
   resolution, parental filtering are out of scope of this analysis.

Migault                    Expires May 7, 2020                  [Page 7]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   Note that even if one is connected to an ISP and the DNS resolver is
   provided by the ISP, it could be interpreted that encryption is not
   necessary.  This is not the case, especially as wireless
   communication might be unprotected or provide the ability to man-in
   -the middle.  As such we assume in this section that the channel
   between the end user and the DNS resolver is encrypted.

   In general a DNS resolver can be seen as an anonymizer.  It receives
   a DNS request from a specific end user, resolves the request under
   the resolver's identity and sends the response back to the end user.
   Local ISPs are believed to be fairly close to the end user and as
   such the IP address of the resolver can be used as a fairly good
   approximation of the localisation of the end user without revealing
   information about its identity.

   When the end user is using a DNS resolver that is not located into
   the ISP network, the end user is clearly providing this information
   to another entity that used not to have this information and that
   cannot infer it from observing the traffic.  Similarly to the ISP,
   the level of information depends on what is already shared with that
   entity.  If the end user were not sharing any information with that
   entity, the end user may provide sufficient information to get
   profiled by an additional entity.  If that DNS resolver already got a
   significant amount of data on that user, that data may fill the
   little remaining privacy but could have a much smaller impact.  For
   example, in a highly concentrated Internet with one cloud provider
   for all services, the end user traffic would use one - or a few -
   destination IP addresses.  That cloud provider would have access to
   all the history of the end user, while the ISP would have little
   information from the IP addresses or the encrypted SNI.  In this
   specific situation the end user may chose that cloud provider for it
   DNS resolutions, to minimize the information leakage.  Such scenario
   is currently believed to be purely hypothetical.

   As a result, using a public resolver rather than a local resolver can
   be seen as sharing the web history of the end user.  The balance
   between sharing partially that history versus completely transferring
   this information depends on the level of concentration of the
   Internet and the ability of the resolver not to further share that
   information.  Using a public resolver also means that the user has to
   trust the public resolver handling the information according to the
   user's wishes.

6.  Privacy impact of concentration

   Section Section 3 pointed out that concentration was one factor that
   could contribute in enforcing boundaries between the information
   carried through the DNS traffic and the information provided by

Migault                    Expires May 7, 2020                  [Page 8]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   observing the web traffic.  This section analyses how concentration
   impacts privacy.

   At first sight the ability that concentration has to 'hide' multiple
   web sites behind a single IP address has to be balanced by the fact
   that a significant amount of your traffic is going into one place -
   or at least a single actor.  In other words, concentration represent
   a direct threat to privacy with all you data being provided to one
   person.  The cost associated to hide the signification of the IP
   address is too high, and even a higher trust in one cloud provider
   rather than your ISP could hardly justify such an approach.

   Firstly, in order to hide the signification of the destination IP
   address, mechanisms such as TOR should be used instead.  Secondly,
   trust may change over time, but provided data can hardly be retired.
   As such privacy should be designed in a way that does not depends on
   one or few players.  Ideally, the data should be sufficiently spread
   among the various players so that none of them could exploit them.
   This can only be enforced by a *large number of player*. Typically as
   long as the fraction of data shared with one player is sufficient to
   start being analysed, privacy is at risk.

   To balance that risk, it matters to reduce the amount of data shared
   as well as to minimize the level of information associated to that
   data.  Typically suppose that a cloud provider proposes both a DNS
   resolution service as well as hosts the web server www.example.com.
   Performing the DNS resolution over that cloud provider for
   www.example.com will provide limited additional information as the
   DNS resolution will follow an web connection.

   Note that concentration here includes the access to the data.  In
   other words, a cloud provider hosting various web servers without
   possible access to that data will not fall into the concentration
   concept described in this section.

   While the concentration represents a threat to the privacy, the
   remaining of this section analysis the impact of a cloud provider
   providing both a DNS resolution service and hosting service and
   exposes how this could contribute in balkanizing the internet, or
   more precisely capturing end users into close to wall gardened
   networks.

   Typically, one cloud provider hosting a DNS resolver is likely to
   redirect the end user traffic within its data center rather than to
   the data center of a competitor.  Note that such choice may be
   appropriated according to the localisation of the DNS resolver.  The
   problem may arise when the end user would benefit of a better
   connectivity by accessing the web site instantiated in the cloud of

Migault                    Expires May 7, 2020                  [Page 9]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   an other cloud provider.  In this case, the choice of the DNS
   resolver may be motivated by its own interest rather than the
   interest of the end user thus capturing the end user.  Furthermore,
   the former optimization of the data center of the DNS resolver might
   lead in capturing the end user.  Here capturing would mean the cloud
   provider is keeping the end user - as much as it can - within its
   borders.  Such capture represents a major threat to privacy as the
   end user is literally kept into one entity, independently of its
   willingness.

   The ability to capture an end user is problematic as it might become
   a mean to bring the end user into a different jurisdiction as its
   local jurisdiction.  This may represent a direct threat to its
   private information as some jurisdictions provide little protection
   regarding to privacy.  The comparison of local jurisdiction versus
   other jurisdictions is not the topic of the document.  We do not
   ignore that certain jurisdictions represent a permanent threat to
   privacy.  However, those jurisdictions put apart, it might also to
   notice that the local jurisdiction is probably the one best
   understood by the end user, and that bringing its data into other
   jurisdiction may goes against its believes.  Similarly, some aspects
   of jurisdictions may also reflect the choice of societies, like the
   protection of the weakest of their members [IWF].

6.1.  Acknowledgment

   We would like to thank the feed backs we received from Bengt Sahlin,
   Christian Schaefer and Mirja Kuhlewind.

7.  References

7.1.  Normative References

   [RFC1035]  Mockapetris, P., "Domain names - implementation and
              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Migault                    Expires May 7, 2020                 [Page 10]
Internet-Draft    A privacy analysis on DoH deployment     November 2019

   [RFC8484]  Hoffman, P. and P. McManus, "DNS Queries over HTTPS
              (DoH)", RFC 8484, DOI 10.17487/RFC8484, October 2018,
              <https://www.rfc-editor.org/info/rfc8484>.

7.2.  Informative References

   [IWF]      "DNS over HTTPS Why we're saying DoH could be
              catastrophic", n.d., <https://www.iwf.org.uk/news/dns-
              over-https-why-we%E2%80%99re-saying-doh-could-be-
              catastrophic>.

   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
              Text on Security Considerations", BCP 72, RFC 3552,
              DOI 10.17487/RFC3552, July 2003,
              <https://www.rfc-editor.org/info/rfc3552>.

   [RFC6973]  Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
              Morris, J., Hansen, M., and R. Smith, "Privacy
              Considerations for Internet Protocols", RFC 6973,
              DOI 10.17487/RFC6973, July 2013,
              <https://www.rfc-editor.org/info/rfc6973>.

   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
              2014, <https://www.rfc-editor.org/info/rfc7258>.

Author's Address

   Daniel Migault
   Ericsson
   8275 Trans Canada Route
   Saint Laurent, QC  4S 0B6
   Canada

   EMail: mglt.ietf@gmail.com

Migault                    Expires May 7, 2020                 [Page 11]