Network Working Group                                       Y. Pettersen
Internet-Draft                                        Opera Software ASA
Updates: 2965 (if approved)                                March 7, 2010
Intended status: Experimental
Expires: September 8, 2010


   The TLD Subdomain Structure Protocol and its use for Cookie domain
                               validation
                  draft-pettersen-subtld-structure-06

Abstract

   This document defines a protocol and specification format that can be
   used by a client to discover how a Top Level Domain (TLD) is
   organized in terms of what subdomains are used to place closely
   related but independent domains, e.g. commercial domains in country
   code TLDs (ccTLD) like .uk are placed in the .co.uk subTLD domain.
   This information is then used to limit which domains an Internet
   service can set cookies for, strengthening the rules already defined
   by the cookie specifications.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.



Pettersen               Expires September 8, 2010               [Page 1]


Internet-Draft          SubTLD Structure Protocol             March 2010


   This Internet-Draft will expire on September 8, 2010.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008.  The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.


1.  Introduction

   The Domain Name System [RFC1034] used to name Internet hosts allows a
   wide range of hierarchical names to be used to indicate what a given
   host is, some implemented by the owners of a domain, such as creating
   subdomains for certain tasks or functions, others, called subTLDs or
   registry-like domains, are created by the Top Level Domain registry
   owner to indicate what kind of service the domain is, e.g.
   commercial, educational, government or geographic location, e.g. city
   or state.

   While this system makes it relatively easy for TLD administrators to
   organize online services, and for the user to locate and recognize
   relevant services, this flexibility causes various security and
   privacy related problems when services located at different hosts are
   allowed to share data through functionality administrated by the
   client, e.g.  HTTP state management cookies [RFC2965] [NETSC].  Most
   information sharing mechanisms make the process of sharing easy,



Pettersen               Expires September 8, 2010               [Page 2]


Internet-Draft          SubTLD Structure Protocol             March 2010


   perhaps too easy, since in many cases there is no mechanism to ensure
   that the servers receiving the information really want it, and it is
   often difficult to determine the source of the information being
   shared.  To some extent [RFC2965] addresses some of these concerns
   for cookies, in that clients that supports [RFC2965] style cookies
   sends the target domain for the cookie along with the cookie so that
   the recipient can verify that the cookie has the correct domain.
   Unfortunately, [RFC2965] is not widely deployed in clients, or on
   servers.  The recipient(s) can make inappropriate information sharing
   more difficult by requiring the information to contain data
   identifying the source and assuring the integrity of the data, e.g.
   by use of cryptographic technologies.  These techniques tend,
   however, to be computationally costly.  There are two problem areas:

   o  Incorrect sharing of information between non-associated services
      e.g. example1.com and example2.com or example1.co.uk and
      example2.co.uk.  That is, the information may be distributed to
      all services within a given Top Level Domain.

   o  Undesirable information sharing within a single service.  This is,
      in particular, a problem for services that sell hosting services
      to many different customers, such as webhotels, where the service
      itself has little or no control of the customers actions.

   While both these problems are in some ways similar, they call for
   different solutions.  This specification will only propose a solution
   for the first problem area.  The second problem area must be handled
   separately.  This specification will first define a TLS Subdomain
   Structure Protocol that can be used to discover the actual structure
   of a Top Level Domain e.g. that the TLD have several subTLDs co.tld,
   ac.tld, org.tld, then it will show how this information can be used
   to determine when information sharing through cookies is not
   desirable.


2.  The TLD Subdomain Structure Protocol

   The TLD Subdomain Structure Protocol is an HTTP service, managed by
   either the application vendor, the TLD owners, or some other trusted
   organization, and located at a URI location that, when queried,
   returns information about a TLD's domain structure.  The client can
   then use this information to decide what actions are permitted for
   the protocol data the client is processing.  Procedure for use:

   o  The client retrieves the domain list for the Top Level Domain
      "tld" from the vendor specified URI
      https://tld-structure.example.com/tld/domainlist .  Multiple
      alternative URI for fallback procedure may be specified



Pettersen               Expires September 8, 2010               [Page 3]


Internet-Draft          SubTLD Structure Protocol             March 2010


   o  The Content-Type of the returned list MUST be application/
      subdomain-structure.

   o  The retrieved specification SHOULD be cached by the client for at
      least 30 days

   o  The TLD owner SHOULD update the list at least 90 days before a new
      sub-domain becomes active.

   o  If no specification can be retrieved the user agent MAY fall back
      to alternative methods, depending on the profile.

2.1.  Securing the domain information

   Individuals with malicious intent may wish to modify the domain list
   served by the service location to either classify a domain
   incorrectly as a subTLD or to hide a subTLD's classification.  Beside
   obviously securing the hosting locations, this also means that the
   content served will have to be secured.

   1.  Digitally sign the specification, using one of the available
       message signature methods, e.g.  S/MIME [RFC2311].  This will
       secure the content during storage both at the client and the
       server, as well as during transit.  The drawback is that the
       client must implement decoding and verification of the message
       format which it may not already support, which may be problematic
       for clients having limited resources.

   2.  Using an encrypted connection, such as HTTP over TLS [RFC2818],
       which is supported by many clients already.  Unfortunately, this
       method does not protect the content when stored by the client.

   3.  Use XML Signatures [RFC3275] to create a signature over the
       specification.  This method is currently not defined.

   This specification recommends using HTTP over TLS, and the client
   MUST use the non-anonymous cipher suites, to secure the transport of
   the specification.  The client MUST ensure that the hostname in the
   certificate matches the hostname used in the request.

2.2.  Domainlist format

   The domain list file can contain a list of subdomains that are
   considered top level domains, as well as a special list of names that
   are not top level domains.  None of the domain lists need specify the
   TLD name, since that is implicit from the request URI.  The domain
   names listed MUST be encoded in punycode, according to [RFC3490].




Pettersen               Expires September 8, 2010               [Page 4]


Internet-Draft          SubTLD Structure Protocol             March 2010


2.2.1.  Domainlist schema

   The domain list is an XML file that follows the following schema

       default namespace = "http://xmlns.opera.com/tlds"

       start =
           element tld {
             attribute levels { xsd:nonNegativeInteger | "all"},
             attribute name { xsd:NCName },
             (domain | registry)*
           }
       registry =
           element registry {
             attribute levels { xsd:nonNegativeInteger },
             attribute name { xsd:NCName },
             attribute all { string "true" | string "false" },
             (domain | registry)*
           }
       domain =
           element domain {
             attribute name { xsd:NCName }
           }

   The domainlist file contains a single <tld> block, which may contain
   multiple registry and domain blocks, and a registry block may also
   contain multiple registry and domain blocks.

   Both domain and registry tags MUST contain a name attribute
   identifying the domain or registry.  The tld block MAY have a name
   attribute, but this name MUST be ignored by clients, which must
   instead use the name of the TLD used to request the file.

   All names MUST be punycode encoded to make it possible for clients
   not supporting IDNA to use the document.

   The tld and registry blocks MAY contain an attribute, levels,
   specifying how many levels below the current domain are registry-
   like.  The default is none, meaning that the default inside the
   current domain level is that labels are ordinary domains and not
   registry-like.  If the levels attribute is 1 (one) it means that by
   default all next-level labels within the registry/tld are registry
   like and not normal domains.  If the levels attribute is the case-
   insesitive token "all", then all subdomains domains below the current
   domain are registry-like.

   A registry block with attribute "all" set to "true" means that all
   domains in that registry (at the current level) are registries,



Pettersen               Expires September 8, 2010               [Page 5]


Internet-Draft          SubTLD Structure Protocol             March 2010


   unless there are overrides, and may contain additional registry or
   domain blocks, which then apply to domains x.example.tld, if the all
   block is inside the "example" block of "tld".  This allows
   specification of wildcard structures, where the structure for lower
   domains are similar for all domains.

   Implementations MUST ignore attributes and syntax they do not
   recognize.

2.2.2.  Domainlist interpretation

   For each new registry or domain block within the tld or registry the
   effective domain name the block applies to is the name of the block
   prepended to the ".name" of the effective domain name of the
   containing block.

   For the tld block the effective domain name is the name of the TLD
   the client is evaluating, and for the registry block named "example"
   the effective name becomes example.tld.
   <?xml version="1.0" encoding="UTF-8"?>
   <tld xmlns="http://xmlns.opera.com/tlds" name="tld" levels="1" >
       <registry name="co" levels="0">
         <registry name="state" />
       </registry>
       <registry name="province">
         <registry all level="1">

           <domain name="school" />
         </registry>
       </registry>
       <registry name="example" levels="1" />
       <domain name="parliament" />
   </tld>

   In the above example, the specification is for the TLD "tld".  By
   default any second level domain "x.tld" is a registry-like domain,
   although parliament.tld is not a registry-like domain

   In the example TLD, however, the co.tld registry has a sub registry
   "state.co.tld", while all other domains in the co.tld domains are
   ordinary domains.

   Additionally all school.domain.province.tld domains for all "domain"
   lablels, and that any other domain under domain.province.tld are
   registry-like.

   Also, the registry example.tld has defined all domains y.example.tld
   as registry-like, with no exceptions.s



Pettersen               Expires September 8, 2010               [Page 6]


Internet-Draft          SubTLD Structure Protocol             March 2010


3.  A TLD Subdomain Structure Protocol profile for HTTP Cookies

   HTTP State management cookies is one area where it is important, both
   for security and privacy reasons, to ensure that unauthorized
   services cannot set cookies for another service.  Inappropriate
   cookies can affect the functionality of a service, but may also be
   used to track the users across services in an undesirable fashion.
   Neither the original Netscape cookie specification [NETSC] nor
   [RFC2965] are adequate in many cases.

   The [NETSC] rules require only that the target domain must have one
   internal dot (e.g. example.com) if the TLD belong to a list of
   generic TLDs (gTLD), while for all TLDS the domain must contain two
   internal dots (e.g. example.co.uk).  The latter rule was never
   properly implemented, in particular due to the many flat ccTLD domain
   structures that are in use.

   [RFC2965] set the requirement that cookies can only be set for the
   server's parent domain.

   Unfortunately, both policies still leave open the possibility of
   setting cookies for a subTLD by setting the cookie from a host name
   example.subtld.tld to the domain subtld.tld, which is by itself
   legal, but not desirable because that means that the cookie can be
   sent to numerous websites either revealing sensitive information, or
   interfering with those other websites without authroization.  As can
   be seen, these rules do not work satisfactorily, especially when
   applied to ccTLDs, which may have a flat domain structure similar to
   the one used by the generic .com TLD, a hierarchical subTLD structure
   like the one used by the .uk ccTLD (e.g. .co.uk), or a combination of
   both.  But there are also gTLDs, such as .name, for which cookies
   should not be allowed for the second level domains, as these are
   generally family names shared between many different users, not
   service names.  A partially effective method for distinguishing
   servicenames from subTLDs by using DNS has been defined in
   [DNSCOOKIE].  However this method is not immune to TLD regsitries
   that uses subTLDs as directories, or to services that does not define
   an IP address for the domainname.  Using the TLD Subdomain Structure
   Protcol to retrieve a list of all subTLDs in a given TLD will solve
   both those problems.

3.1.  Procedure for using the TLD Subdomain Structure Protcol for
      cookies

   When receiving a cookie the client must first perform all the checks
   required by the relevant specification.  Upon completion of these
   checks the client then performs the following additional verification
   checks if the cookie is being set for the server's parent, grand-



Pettersen               Expires September 8, 2010               [Page 7]


Internet-Draft          SubTLD Structure Protocol             March 2010


   parent domain (or higher):

   1.  If the domain structure of the TLD is not known already, or the
       structure information has expired, the client should retrieve or
       validate the structure specification from the server hosting the
       specification, according to Section 2.  If retrieval is
       unsuccessful, and no copy of the specification is known, the
       client MAY use alternative methods to decide the domain's status,
       e.g. those described in [DNSCOOKIE], or other heuristics.
       Evaluate the specification as specified in section 2.  If the
       target domain is part of the subTLD structure the cookie MUST be
       discarded.

   2.  If the target domain is not a subTLD, the cookie is accepted.

3.2.  Unverifiable transactions

   Use of HTTP Cookies, combined with HTTP requests to resources that
   are located in domains other than the one the user actually wants to
   visit, have caused widespread privacy concerns.  The reason is that
   multiple websites can link to the same independent website, e.g. an
   advertiser, who may then use cookies to build a profile of the
   visitor, that can be used to select advertisements that are of
   interest to the user.

   [RFC2965] specified that if the name of the host of an included
   resource does not domain match the domain reach (defined as the
   parent domain of the host) of the URL of the document the user
   started loading, loading the resource is considered an unverifiable
   transcation, and in such third party transactions cookies should not
   be sent or accepted.  The latter point is not widely implemented,
   except when selected by especially interested users.  This means that
   server1.example.com and server2.example.com can share cookies, and
   either can be referenced automatically (e.g. by including an image)
   by the other without being considered an unverifiable transaction,
   while requests to server3.example2.com would be considered an
   unverifiable transaction.  However, like the normal domain matching
   rule for cookies, this rule opens up some holes.  If the host
   example.co.uk requests a resource from server4.example3.co.uk, the
   request to example3.co.uk server would not be considered an
   unverifiable transaction because example.co.uk's reach is co.uk,
   which domain matches server4.example3.co.uk, a conclusion which is
   obviously, to a human with some knowlegde of the .uk domain
   structure, incorrect.

   To avoid such misclassifications clients SHOULD apply the procedure
   specified in 3.1 for the reach domain used to decide if a request is
   an unverifiable, and if the reach domain is a subTLD, the reach of



Pettersen               Expires September 8, 2010               [Page 8]


Internet-Draft          SubTLD Structure Protocol             March 2010


   the original host must be changed to become the same as the name of
   the host itself, and requests that do not domain match the original
   host's name must be considered unverifiable transactions.  That is,
   the reach for example.co.uk becomes example.co.uk, not co.uk, and
   example3.co.uk will therefore not domain match the resulting reach.


4.  Examples

   The following examples demonstrates how the TLD Subdomain Structure
   Protcol can be used to decide cookie domain permissions.

4.1.  Example 1
   <?xml version="1.0" encoding="UTF-8"?>
   <tld xmlns="http://xmlns.opera.com/tlds" name="tld" levels="1" >
       <domain name="example" />
   </tld>


   This specification means that all names at the top level are subTLDs,
   except "example.tld" for which cookies are allowed.  Cookies are also
   implicitly allowed for any y.x.tld domains.

4.2.  Example 2
   <?xml version="1.0" encoding="UTF-8"? >
   <tld xmlns="http://xmlns.opera.com/tlds" name="tld" >
       <registry name="example1" levels="1" />
       <registry name="example2" levels="1" />
   </tld>


   This specification means that example1.tld and example2.tld and any
   domains foo.example1.tld and bar.example2.tld are registry-like
   domains for which cookies are not allowed, for any other domains
   cookies are allowed.

4.3.  Example 3
   <?xml version="1.0" encoding="UTF-8"?>
   <tld xmlns="http://xmlns.opera.com/tlds" name="tld" >
       <registry name="example1" levels="1" />
       <registry name="example2" levels="1" >
          <domain name="example3" />
       </registry>
   </tld>


   This example has the same meaning as Example 2, but with the
   exception that the domain example3.example2.tld is a regular domain



Pettersen               Expires September 8, 2010               [Page 9]


Internet-Draft          SubTLD Structure Protocol             March 2010


   for which cookies are allowed.


5.  IANA Considerations

   This specification also requires that responses are served with a
   specific media type.  Below is the registration information for this
   media type.

5.1.  Registration of the application/subdomain-structure Media Type

   Type name : application

   Subtype name: subdomain-structure

   Required parameters: none

   Optional parameters: none

   Encoding considerations: The content of this media type is always
   transmitted in binary form.

   Security considerations: See Section 6

   Interoperability considerations: none

   Published specification: This document

   Additional information:

   Magic number(s): none

   File extension(s):

   Macintosh file type code(s):

   Person & email address to contact for further information: Yngve N.
   Pettersen

   Email: yngve@opera.com

   Intended usage: common

   Restrictions on usage: none

   Author/Change controller: Yngve N. Pettersen

   Email: yngve@opera.com



Pettersen               Expires September 8, 2010              [Page 10]


Internet-Draft          SubTLD Structure Protocol             March 2010


6.  Security Considerations

   Retrieval of the specifications are vulnerable to denial of service
   attacks or loss of network connection.  Hosting the specifications at
   a single location can increase this vulnerability, although the
   exposure can be reduced by using mirrors with the same name, but
   hosted at different network locations.  This protocol is as
   vulnerable to DNS security problems as any other [RFC2616] HTTP based
   service.  Requiring the specifications to be digitally signed or
   transmitted over a authenticated TLS connection reduces this
   vulnerabity.

   Section 3 of this document describe using the domain list defined in
   section 2 as a method of increasing security.  The effectiveness of
   the domain list for this purpose, and the resulting security for the
   client depend both on the integrity of the list, and its correctness.
   The integrity of the list depends on how securely it is stored at the
   server, and how securely it is transmitted.  This specification
   recommends downloading the domain list using HTTP over TLS, which
   makes the transmission as secure as the message authentication
   mechanism used (encryption is not required), and the servers should
   be configured to use the strongest available key lengths and
   authentication mechansims.  An alternative approach would be to
   digitally sign the files.

   The correctness of the list depends on how well the TLD registry
   defined it.  A list that does not include some subTLDs may expose the
   client to potential privacy and security problems, but not any worse
   than the situation would be without this protocol and profile, while
   a subdomain incorrectly classified as a subTLD can lead to denial of
   service for the affected services.  Both of the problems can be
   prevented by careful construction and auditing of the lists, both by
   the TLD registry, and by interested thirdparties.


7.  Acknowledgements

   Anne van Kesteren assisted with defining the XML format in
   Section 2.2.1.


8.  References

8.1.  Normative References

   [NETSC]    "Persistent Client State HTTP Cookies",
              <http://wp.netscape.com/newsref/std/cookie_spec.html>.




Pettersen               Expires September 8, 2010              [Page 11]


Internet-Draft          SubTLD Structure Protocol             March 2010


   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, November 1987.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2311]  Dusse, S., Hoffman, P., Ramsdell, B., Lundblade, L., and
              L. Repka, "S/MIME Version 2 Message Specification",
              RFC 2311, March 1998.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC2818]  Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000.

   [RFC2965]  Kristol, D. and L. Montulli, "HTTP State Management
              Mechanism", RFC 2965, October 2000.

   [RFC3275]  Eastlake, D., Reagle, J., and D. Solo, "(Extensible Markup
              Language) XML-Signature Syntax and Processing", RFC 3275,
              March 2002.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

8.2.  Non-normative references

   [DNSCOOKIE]
              Pettersen, Y., "Enhanced validation of domains for HTTP
              State Management Cookies using DNS. Work in progress.",
              November 2008,
              <draft-pettersen-dns-cookie-validate-05.txt>.

   [PUBSUFFIX]
              "The Homepage of the Public Suffix List, a list of
              registry-like domains (subTLDs) gathered by volunteers.",
              <http://publicsuffix.org/>.


Appendix A.  Collection of information for the TLD structure
             specification

   This document does not define how the information encoded in the TLD
   Structure Specification is gathered.

   There are several methods available for collecting the information



Pettersen               Expires September 8, 2010              [Page 12]


Internet-Draft          SubTLD Structure Protocol             March 2010


   encoded in the TLD Structure Specification, the two main ones being:

      Data provided by the TLD registry owner through a machine readable
      repository at wellknown locations.

      Data gathered by one or more application vendors based on publicly
      available information, such as the Mozilla Project's Public Suffix
      [PUBSUFFIX] List.


Appendix B.  Alternative solutions

   A possible alternative to the format specified in Section 2, encoding
   the information directly in the DNS records for the registry -like
   domain, using a DNS extensions.

   Accessing this type of information require that the client or its
   environment is able to directly access the DNS network.  In many
   environments, e.g. firewalled systems.  Not all runtime environments
   can provide this information, which may lead to a DNS client embedded
   directly in the client.

   For some applications it may be necessary, due to system limitations,
   to access this information through an online Web Service in order to
   provide the necessary information for each hostname or domain
   visited.  A Web service may, however, introduce unnecessary privacy
   problems, as well as delays each time a new domain is tested.


Appendix C.  Open issues

   o  Download location URI for the original domain lists

   o  Should Digital signatures be used on the files, instead of using
      TLS?


Author's Address

   Yngve N. Pettersen
   Opera Software ASA
   Waldemar Thranes gate 98
   N-0175 OSLO,
   Norway

   Email: yngve@opera.com





Pettersen               Expires September 8, 2010              [Page 13]