Network Working Group                                         V. Krasnov
Internet-Draft                                          Cloudflare, Inc.
Intended status: Informational                                  Y. Weiss
Expires: September 6, 2018                     Akamai Technologies, Inc.
                                                           March 5, 2018


                  Compression Dictionaries for HTTP/2
             draft-vkrasnov-h2-compression-dictionaries-03

Abstract

   This document specifies new HTTP/2 frame types and new HTTP/2
   settings values that enable the use of previously transferred data as
   compression dictionaries, significantly improving overall compression
   ratio for a given connection.

   In addition, this document proposes to define a set of industry
   standard, static, dictionaries to be used with any Lempel-Ziv based
   compression for the common textual MIME types prevalent on the web.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 6, 2018.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect



Krasnov & Weiss         Expires September 6, 2018               [Page 1]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Conventions and Terminology . . . . . . . . . . . . . . .   3
   2.  Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Security Considerations . . . . . . . . . . . . . . . . .   3
     2.2.  Content Coding  . . . . . . . . . . . . . . . . . . . . .   3
     2.3.  Compression Contexts  . . . . . . . . . . . . . . . . . .   4
     2.4.  Server Push Interaction . . . . . . . . . . . . . . . . .   4
     2.5.  HTTP/QUIC . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  HTTP/2 Extension  . . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Extension Settings  . . . . . . . . . . . . . . . . . . .   4
     3.2.  Extension Frames  . . . . . . . . . . . . . . . . . . . .   5
       3.2.1.  The SET_COMPRESSION_CONTEXT frame . . . . . . . . . .   5
       3.2.2.  The SET_DICTIONARY Frame  . . . . . . . . . . . . . .   5
       3.2.3.  The USE_DICTIONARY Frame  . . . . . . . . . . . . . .   7
     3.3.  Static Dictionaries . . . . . . . . . . . . . . . . . . .   7
   4.  Dictionary State  . . . . . . . . . . . . . . . . . . . . . .   8
     4.1.  Attack scenarios and mitigations  . . . . . . . . . . . .  10
       4.1.1.  Cross-origin secret leak  . . . . . . . . . . . . . .  10
       4.1.2.  Same-origin secret leak . . . . . . . . . . . . . . .  11
   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     5.1.  Normative References  . . . . . . . . . . . . . . . . . .  12
     5.2.  Informative References  . . . . . . . . . . . . . . . . .  12
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12

1.  Introduction

   The HTTP/2 [RFC7540] protocol encourages the use of many small assets
   for CSS/JS/HTML, due to its multiplexed nature.  Prior to HTTP/2,
   asset inlining was encouraged, resulting in fewer, larger assets per
   website.

   The HTTP/2 protocol also allows for transmitted data to be compressed
   with a lossless compression format.  The format used is specified in
   the "Content-Encoding" (see [RFC2616], section 14.11) header field.
   For example, "Content-Encoding: br" means the data was compressed
   using the Brotli format.

   The nature of the compression algorithms, such as DEFLATE [RFC1951]
   and Brotli [RFC7932], used with HTTP in practice, require a certain
   "window" of data to perform backward matching.  Therefore, larger
   files have much better compression ratio.  To improve compression for



Krasnov & Weiss         Expires September 6, 2018               [Page 2]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   smaller files, these algorithms allow to use a chunk of arbitrary
   data as a "Custom Dictionary" and function as the initial sliding
   window.

   Note: While that is not longer true for the latest stable version of
   Brotli, there's work underway to re-enable use of arbitrary
   compression dictionaries.

   Compression is a compute-heavy operation, where investing additional
   compute power results in diminishing returns (in terms of compression
   ratio/CPU cycles).  The "Custom Dictionary" technique is known to
   improve compression ratio significantly, with little additional
   computational cost.  It is also supported by most Lempel-Ziv based
   compression formats.

   This document introduces a mechanism for using previously transmitted
   data over HTTP/2 as a dictionary to be used with an underlying
   compression algorithm.

1.1.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in RFC
   2119 [RFC2119].

2.  Preliminaries

2.1.  Security Considerations

   The use of compression over an encrypted connection could be used by
   malicious actors to potentially leak sensitive information.  We will
   collaborate with industry experts to identify any additional attack
   vectors introduced by this draft, and include a set of best practices
   to both servers and clients that would implement it.

   A list of attack vectors and potential mitigations is described later
   in this document.

2.2.  Content Coding

   A server that wishes to apply protocol level compression on a stream
   or use a stream as a dictionary SHOULD not apply non-identity
   content-coding (see [RFC7231], section 3.1.2.1) to that stream.







Krasnov & Weiss         Expires September 6, 2018               [Page 3]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


2.3.  Compression Contexts

   In the scope of this document, a compression context is a set of non-
   overlaping streams, that SHALL only be used as compression
   dictionaries for streams within the same compression context.  While
   it is the responsibility of the server to implement best-practice
   techniques to mitigate cross-compression side channel attacks,
   compression contexts let the client mitigate some of the risks of
   cross-compression side channel attacks, by explicitly stating which
   requests can be cross-compressed with which requests.

   For example a client may choose to disable compression for cross-site
   requests by assigning them to different compression contexts.

2.4.  Server Push Interaction

   Pushed streams may be cross-stream compressed or used as
   dictionaries, same as a regular stream.  In some scenarios it may
   benefit the server to push a dummy resource to prime a dictionary.

2.5.  HTTP/QUIC

   Due to the nature of this draft, it is expected that a strict order
   is maintained between the definition and consumption of dictionaries.
   The nature of QUIC is such that frames and streams might not
   delivered in the order they are sent, therefore, a head-of-line
   blocking may occur when implementing compression dictionaries in
   HTTP/QUIC.  This is similar to the tradeoff present in the HPACK/QUIC
   mapping.

3.  HTTP/2 Extension

3.1.  Extension Settings

   The extension introduces a new SETTINGS value.

   SETTINGS_COMPRESSION(0xTBA):  For greater compression, and to prevent
      setting identifier depletion, the 32-bit value for this setting is
      defined as follows:

   +---------------+---------+-----------+-----------+
   | SDVersion (8) | Fmt (8) | DSize (8) | NDict (8) |
   +---------------+---------+-----------+-----------+

   NDict:  Indicates the number of dictionaries the client is willing to
      maintain.  The default value is 0, the maximal value is 255.





Krasnov & Weiss         Expires September 6, 2018               [Page 4]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   DSize:  Log2 of the maximal size of each dictionary.  The default
      value is 0, the maximal value is 255.  For example value of 17
      indicates each dictionary MUST be smaller or equal to 2^17
      (131,072 octets).

   Fmt:  Compression format to use, as a bitmask. 1st bit indicates
      brotli, 2nd bit indicates zlib.  Other bits are reserved for
      future compression methods.  A value of 0 indicates no support for
      cross-stream compression.

   SDVersion:  If greater than 0, indicates the version of static
      dictionaries to use.  Maximal value is 255, the default value is
      0, which indicates no static dictionaries are used.

3.2.  Extension Frames

3.2.1.  The SET_COMPRESSION_CONTEXT frame

   The SET_COMPRESSION_CONTEXT frame (type=0xTBA).

   +-------------+
   | Context (8) |
   +-------------+

   The SET_COMPRESSION_CONTEXT frame can be sent by the client on any
   stream in the idle state.  The frame indicates the compression
   context ID for the given stream.  Frames with an assigned context
   SHALL NOT be compressed using dictionaries from a different context.
   Frames with an assigned context SHALL NOT be used as a dictionary for
   streams with from a different context.

   The SET_COMPRESSION_CONTEXT frame contains the following fields:

   Context:  an 8-bit context ID that indicates the compression context
      for the stream.  If the frame is ommited, then the context value
      is assumed to be 0.  The allowed context values are 0 through 255.
      A special context ID of 255 indicates the stream can only be
      compressed using the static dictionaries.

3.2.2.  The SET_DICTIONARY Frame

   The SET_DICTIONARY frame (type=0xTBA) contains one to many
   Dictionary-Entry.

   +---------------+---------------+
   |   Dictionary-Entry (+)    ...
   +---------------+---------------+




Krasnov & Weiss         Expires September 6, 2018               [Page 5]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   A Dictionary-Entry field is encoded as follows:

   +-------------------------------+
   |       Dictionary-ID (8)       |
   +---+---------------------------+
   | P |        Size (7+)          |
   +---+---------------------------+
   | E?| D?|  Truncate? (6+)       |
   +---+---------------------------+
   |           Offset? (8+)        |
   +-------------------------------+


   The SET_DICTIONARY frame can be sent from the server to the client,
   on any client initiated stream in the open or half-closed (remote)
   states, or on any server initiated stream in the reserved (local)
   state.  The SET_DICTIONARY frame MUST precede any DATA frames on that
   stream.  The SET_DICTIONARY frame SHOULD be followed by sufficient
   DATA frames to build the dictionaries.  If a RST frame was received
   for the stream before sufficient DATA was sent, the dictionaries are
   reset.

   The Dictionary-Entry contains the following fields:

   Dictionary-ID:  an 8-bit ID, indicates the dictionary.  MUST be lower
      than the value agreed by the SETTINGS_COMPRESSION setting.

   Size:  Indicates how many octets of the stream will be used for the
      dictionary.  Size is represented as an integer with 7-bit prefix
      (see [RFC7541], Section 5.1).  If P is set, the actual number of
      octets to use is 2 to the power of Size.  If the computed value is
      greater than the length of the decompressed DATA, use all the
      available DATA.

   Truncate:  An optional field, represented as an integer with 6-bit
      prefix.  Present when the APPEND flag is set.  Truncate indicates
      the number of octets to keep of the existing dictionary, before
      appending the new data to it.  If E is set, then Truncate is
      ignored, and new data is appended at the end.  If Truncate is
      zero, then the dictionary is replaced, as if APPEND was unset.  If
      the optional field D is set, then the first Truncate octets of the
      previous dictionary are used, otherwise the last Truncate octets
      are used.

   Offset:  An optional field, represented as an integer with 8-bit
      prefix.  Present when the OFFSET flag is set.  Offset indicates
      that the first Offset octets of the stream are ignored when
      building the dictionary.



Krasnov & Weiss         Expires September 6, 2018               [Page 6]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   The flags defined for the SET_DICTIONARY frame apply to each
   Dictionary-Entry in the frame.  The SET_DICTIONARY frame defines the
   following flags:

   APPEND (0x1):  Indicates that the data is to be appended to the
      existing dictionary with the given ID, as opposed to replacing it
      with the new data.  Also indicates that fields E, D and Truncate
      are present.

   OFFSET (0x2):  Indicates the presence of the Offset field.

3.2.3.  The USE_DICTIONARY Frame

   The USE_DICTIONARY frame (type=0xTBA).

   +-------------+
   | Dict ID (8) |
   +-------------+

   The USE_DICTIONARY frame indicates that the current stream is
   compressed with the indicated dictionary.  The USE_DICTIONARY frame
   MUST be sent prior to any DATA frame on a given stream.
   SET_DICTIONARY and USE_DICTIONARY frames MAY be sent on the same
   stream.  Only one USE_DICTIONARY frame MAY be sent for a stream.

   The USE_DICTIONARY frame contains the following fields:

   Dict ID:  an 8-bit ID that indicates which dictionary to use.  The
      dictionary MUST be previously defined by a SET_DICTIONARY frame,
      or by a static dictionary.

3.3.  Static Dictionaries

   This document proposes to generate a set of up to 8 standard
   dictionaries to be optionally bundled with supporting
   implementations.  Each dictionary should be 32,768 or 65,536 octets
   long.

   Each static dictionary will be identified by an integer ID in the
   range {0..7}.

   If either endpoint supports the use of static dictionaries, it will
   indicate this by setting the SDVersion value of SETTINGS_COMPRESSION
   to greater than 0.  The number will indicate the highest version of
   the dictionaries known.

   The actual version used will be the lowest of the two values set by
   the endpoints.



Krasnov & Weiss         Expires September 6, 2018               [Page 7]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   If the client and the server agree on the use of static dictionaries,
   then both will initialize the first 8 dictionaries (IDs 0 through 7),
   with the contents of the static dictionaries.  The static
   dictionaries belong to context 0.

   If the value of the field NDict is lower than 8, then up to NDict
   dictionaries will be initialized.

4.  Dictionary State

   Both the server and the client MUST process the SET_DICTIONARY and
   USE_DICTIONARY frames in the order they are sent/received, with the
   exception when both are sent over the same stream.  In that case
   USE_DICTIONARY is processed prior to the SET_DICTIONARY frames.

   Doing otherwise will result in an illegal state of the dictionaries.
   This is similar to the way HEADER frames are processed in order to
   maintain legal HPACK state on the server and the client.

   A possible dictionary implementation can be describes as follows:

   struct {
       u8  id;
       u8  ctx;
       u64 size;
       u8  dict[size];
   } D;

   The collection of dictionaries could then be described as:

   D dictionaries[NDict];

   Initially all the dictionaries are unitialized:

   for (i = 0; i < NDict; i++) {
       dictionaries[i] = {id = i, ctx = 0, size = 0, dict = {}};
   }

   Client side USE_DICTIONARY frame behaviour pseudo code:

   dictionary = dictionaries[frame.Dictionary-ID]

   if (dictionary.ctx != 0 && dictionary.ctx != stream.ctx)
       return PROTOCOL_ERROR

   stream.decompressed_data = decompress(stream.dict, stream.data)

   Client side SET_DICTIONARY frame behaviour pseudo code:



Krasnov & Weiss         Expires September 6, 2018               [Page 8]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   foreach entry = frame.Dictionary-Entry {
       dictionary = dictionaries[entry.DICT_ID]

       if (entry.size == 0) {
           dictionary.size = 0
           dictionary.ctx = 0
           dictionary.dict = {}
           continue
       }

       if (dictionary.ctx != 0 && dictionary.ctx != stream.ctx) {
           return PROTOCOL_ERROR
       }

       dictionary.ctx = stream.ctx

       if (entry.P == 1) {
           size = 1 << entry.Size
       } else {
           size = entry.Size
       }

       if (frame.APPEND) {
           if (entry.E == 1) {
               truncate = dictionary.size
           } else {
               truncate = entry.Truncate
           }
       } else {
           truncate = 0
       }

       if (frame.OFFSET) {
           offset = entry.Offset
       } else {
           offset = 0
       }

       new_dict_data = stream.decompressed_data[offset:offset + size]
       if (entry.D == 1) {
           old_dict_data = head(dictionary.dict, truncate)
       } else {
           old_dict_data = tail(dictionary.dict, truncate)
       }

       dict_data = append(old_dict_data, new_dict_data)

       dictionary.dict = tail(dict_data, 1 << settings.DSize)



Krasnov & Weiss         Expires September 6, 2018               [Page 9]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


       dictionary.size = len(dictionary.dict)
   }

   The server behaviour mirrors the client behaviour, but it is up to
   the server to choose the best dictionary.

4.1.  Attack scenarios and mitigations

   A single HTTP/2 connection is likely to be shared among multiple
   origins (over which it is authoritative) and among different
   navigation contexts to the same origin.  When such sharing happens,
   and if compression contexts are shared between those instances, an
   attacker can use a BREACH-style attack in order to exfiltrate secrets
   from the context.  Such secrets may include:

   o  Cookies set using Javascript (and in-particular "httponly" cookies
      set from anonymous functions in external JS, which is not
      accessible to scripts otherwise)

   o  CSRF tokens

   o  CSP nonces

   o  Application level secrets (e.g. financial information, stored
      credit cards numbers, codes, etc.)

   The mechanism for such data theft can happen if the attacker can: *
   Download multiple similar payloads to the target page modulo the
   actual secret, while trying out multiple permutations of the secret.
   * Observe the on-the-wire transfer size using Resource Timing's
   "transferSize" property.

   The rest of this section will describe different scenarios where
   those conditions are met as well as potential mitigations for them.

4.1.1.  Cross-origin secret leak

   An HTTP/2 session can be used to deliver resources from multiple
   origins over which the session has proved to be authoritative,
   through connection reuse (see [RFC7540] section 9.1.1 for more
   details).  As a result, sharing compression contexts between such
   origins can be theoretically used to leak secrets from one of these
   origins to the next.








Krasnov & Weiss         Expires September 6, 2018              [Page 10]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


4.1.1.1.  Mitigation

   Limiting compression contexts to be used within the confines of a
   single origin.

4.1.2.  Same-origin secret leak

   Malicious pages on the origin as well as an XSS attacker can normally
   use "fetch()" or "XMLHttpRequest()" in order to inspect in-content
   secrets.  This could be limited with CSP by only permitting the
   download of specific files, using nonces or using "connect-src
   'none'" in order to limit arbitrary scripts from downloading files
   that contain secrets.  However, using shared-dictionaries between
   secret resources and malicious ones can enable an attacker to guess
   said secrets and exfiltrate them (e.g. using other deficiencies in
   the defined CSP, if there are any).

   Furthermore, said malicious page or XSS attack can also use as a
   dictionary resources fetched from the same origin in a different
   browsing context, enabling it to also inspect resources which cannot
   be fetched at all on its base page.

4.1.2.1.  Mitigation

   There's no obvious mitigation for this kind of attack, but a few
   options are:

   o  Limiting compression contexts to be used only within a single
      navigation context can limit the opportunity for the separate
      navigation context to inspect secrets from resources it is not
      allowed to fetch.  At the same time this can be complex to
      implement, as the network layer is not aware of the navigation
      context and is supposed for example to dedupe outgoing requests
      from different compression contexts.

   o  "transferSize" padding/bucketing in such cases (e.g. pages with
      above mentioned CSP limitations) may be enough to render this
      attack not-practical.

   o  Limit dictionary sharing (or "transferSize" accuracy for resources
      that use shared dictionaries) only to non-credentialed resource
      fetches.

5.  References







Krasnov & Weiss         Expires September 6, 2018              [Page 11]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


5.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616,
              DOI 10.17487/RFC2616, June 1999,
              <https://www.rfc-editor.org/info/rfc2616>.

   [RFC7231]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
              DOI 10.17487/RFC7231, June 2014,
              <https://www.rfc-editor.org/info/rfc7231>.

   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
              DOI 10.17487/RFC7540, May 2015,
              <https://www.rfc-editor.org/info/rfc7540>.

   [RFC7541]  Peon, R. and H. Ruellan, "HPACK: Header Compression for
              HTTP/2", RFC 7541, DOI 10.17487/RFC7541, May 2015,
              <https://www.rfc-editor.org/info/rfc7541>.

5.2.  Informative References

   [BREACH]   Prado, A., Harris, N., and Y. Gluck, "BREACH: SSL, Gone in
              30 Seconds", 2013, <http://breachattack.com/>.

   [RFC1951]  Deutsch, P., "DEFLATE Compressed Data Format Specification
              version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996,
              <https://www.rfc-editor.org/info/rfc1951>.

   [RFC7932]  Alakuijala, J. and Z. Szabadka, "Brotli Compressed Data
              Format", RFC 7932, DOI 10.17487/RFC7932, July 2016,
              <https://www.rfc-editor.org/info/rfc7932>.

Authors' Addresses

   Vlad Krasnov
   Cloudflare, Inc.

   Email: vlad@cloudflare.com





Krasnov & Weiss         Expires September 6, 2018              [Page 12]


Internet-Draft     Compression Dictionaries for HTTP/2        March 2018


   Yoav Weiss
   Akamai Technologies, Inc.

   Email: yoav@yoav.ws















































Krasnov & Weiss         Expires September 6, 2018              [Page 13]