draft-morgan-http2-header-compression-00

HTTPbis                                                        K. Morgan
Internet-Draft                                              C. Brunhuber
Intended status: Standards Track                                    IAEA
Expires: December 4, 2014                                   June 2, 2014


                    H2EZ: HTTP/2 Header Compression
                draft-morgan-http2-header-compression-00

Abstract

   This specification defines the format and compression of HTTP header
   fields in HTTP/2.  The compression is based on EZFLATE, which is a
   token-based DEFLATE algorithm for secure compression within encrypted
   communication channels.

Editorial Note (To be removed by RFC Editor)

   Discussion of this draft takes place on the HTTPBIS working group
   mailing list (ietf-http-wg@w3.org), which is archived at
   <http://lists.w3.org/Archives/Public/ietf-http-wg/>.

   Working Group information can be found at
   <http://tools.ietf.org/wg/httpbis/>;

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 4, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal



Morgan & Brunhuber      Expires December 4, 2014                [Page 1]


Internet-Draft                    H2EZ                         June 2014


   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Detailed Header Format . . . . . . . . . . . . . . . . . . . .  3
     2.1.  ASN.1 Octet String Examples  . . . . . . . . . . . . . . .  4
       2.1.1.  Example 1 (length 15)  . . . . . . . . . . . . . . . .  4
       2.1.2.  Example 2 (length 33)  . . . . . . . . . . . . . . . .  4
       2.1.3.  Example 3 (length 128) . . . . . . . . . . . . . . . .  5
       2.1.4.  Example 4 (length 258) . . . . . . . . . . . . . . . .  5
   3.  EZFLATE Tokenization Rules . . . . . . . . . . . . . . . . . .  6
     3.1.  Header Name Tokenization . . . . . . . . . . . . . . . . .  6
     3.2.  Header Value Tokenization  . . . . . . . . . . . . . . . .  6
     3.3.  Tokenization Example . . . . . . . . . . . . . . . . . . .  7
     3.4.  Sub-Tokenization . . . . . . . . . . . . . . . . . . . . .  9
       3.4.1.  Sub-Tokenization Example 1 (1292 octet token)  . . . .  9
       3.4.2.  Sub-Tokenization Example 2 (516 octet token) . . . . .  9
       3.4.3.  Sub-Tokenization Example 3 (259 octet token) . . . . . 10
   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   5.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     6.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
     6.2.  Informative References . . . . . . . . . . . . . . . . . . 11




















Morgan & Brunhuber      Expires December 4, 2014                [Page 2]


Internet-Draft                    H2EZ                         June 2014


1.  Introduction

   [TODO: A significant portion of text the 'introduction' and 'security
   considerations' sections, is completely lifted from the HPACK
   Internet Draft; re-write or give credit somehow.]

   In HTTP/1.1 (see [HTTP-p1]), header fields are encoded without any
   form of compression.  As web pages have grown to include dozens to
   hundreds of requests, the redundant header fields in these requests
   now measurably increase latency and unnecessarily consume bandwidth
   (see [PERF1] and [PERF2]).

   SPDY [SPDY] initially addressed this redundancy by compressing header
   fields using the DEFLATE format [DEFLATE], which proved very
   effective at efficiently representing the redundant header fields.
   However, that approach exposed a security risk as demonstrated by the
   CRIME attack (see [CRIME]).

   A key observation of the CRIME and BREACH attacks is that they
   exploited the DEFLATE _algorithm_ as described in RFC 1951 [DEFLATE],
   not the DEFLATE _format_ (also described in RFC 1951).

   EZFLATE [EZFLATE], is a token-based DEFLATE compression algorithm for
   secure compression within encrypted communication channels.  The key
   feature of the EZFLATE algorithm (and primary difference to the
   algorithm described in RFC 1951), is that LZ77 <length, backward
   distance> tuples may only reference a duplicated token (octet string)
   occurring in a previous block, up to 32K input bytes before, if the
   current token exactly matches in length _and_ octet values.  As shown
   in Section 3.1, "Take a Bite out of CRIME", of [EZFLATE], this forces
   potential CRIME-like attackers to guess n-character secrets n
   characters at a time rather than one character at a time, making the
   search space no worse than a full brute-force attack.

   This document defines 1) the formatting of HTTP/2 header fields, and
   2) how EZFLATE is used to compress HTTP/2 header fields.  Finally,
   this document also includes implementation recommendations.

2.  Detailed Header Format

   In HTTP/1.1 [HTTP-p1] headers are defined as field-name, field-value
   pairs (see Section 3.2, "Header Fields" of [HTTP-p1]).  The field-
   name and field-value are separated from each other by a single ':'
   character followed by optional whitespace.  Each pair is separated by
   the two octet sequence CRLF (carriage return, line feed).  The end of
   the header block is marked by an empty line followed by CRLF.

   In HTTP/2, the headers are still field-name, field-value pairs except



Morgan & Brunhuber      Expires December 4, 2014                [Page 3]


Internet-Draft                    H2EZ                         June 2014


   that each are length-prefixed octet strings to simplify parsing.  In
   other words, the field-name and field-value are separately length-
   prefixed.  In this way no separator is necessary between the field-
   name and field-value and no separator is necessary between
   consecutive pairs.  The end of the header block is naturally defined
   as the end of a HTTP/2 HEADERS or PUSH_PROMISE frame with the
   END_HEADERS flag set [HTTP2].

   Each field-name and field-value are encoded as an ASN.1 octet string
   [X.208-88][X.209-88].  This simply means the octets which compose a
   field-name or field-value are length-prefixed with a variable-length
   integer which indicates the exact number of octets composing the
   original field-name or field-value.

   For ASN.1 encoded octet-strings, the length prefix is composed of at
   least a single octet, and optionally N additional octets, where N is
   exactly specified by the first octet as follows.  If the length of
   the octets to be length-prefixed is less than 0x80 (128), then only a
   single octet is required for the length prefix and so the length-
   prefix octet is encoded as exactly the length (big endian) of the
   octets to be length-prefixed.  If the length of the octets to be
   length-prefixed is greater than or equal to 0x80 (128), then the most
   significant bit of the first length-prefix octet is set (i.e. 0x80),
   and the remaining 7 bits specify the number of additional octets N
   (big endian) which will be added to encode the length, as an unsigned
   integer (big endian), of the octets to be length-prefixed .

2.1.  ASN.1 Octet String Examples

   The following are examples of computing the length prefix for
   converting various octet strings to ASN.1 octet strings.

2.1.1.  Example 1 (length 15)

   The ASCII string "accept-encoding" is to be converted to an ASN.1
   octet string.  The length of the string is 15 octets.  Since 15 is
   less than 128, the length prefix is simply a single octet with the
   value 15 (0x0f).

   Hex dump of the resulting ASN.1 octet string:

   0000000 0f61 6363 6570 742d | 656e 636f 6469 6e67 | .accept-encoding

2.1.2.  Example 2 (length 33)

   The ASCII string "gzip;q=1.0, identity;q=0.5, *;q=0" is to be
   converted to an ASN.1 octet string.  The length of the string is 33
   octets.  Since 33 is less than 128, the length prefix is simply a



Morgan & Brunhuber      Expires December 4, 2014                [Page 4]


Internet-Draft                    H2EZ                         June 2014


   single octet with the value 33 (0x21).

   Hex dump of the resulting ASN.1 octet string:

   0000000 2167 7a69 703b 713d | 312e 302c 2069 6465 | !gzip;q=1.0, ide
   0000010 6e74 6974 793b 713d | 302e 352c 202a 3b71 | ntity;q=0.5, *;q
   0000020 3d30                |                     | =0

2.1.3.  Example 3 (length 128)

   The string of octets 0x00 0x01 ... 0x7d 0x7f is to be converted to an
   ASN.1 octet string.  The number of octets is 128.  Since 128 equals
   128, the first octet of the length prefix has the most significant
   bit set to indicate the length prefix is longer than one octet.  The
   remaining seven bits encode the number of additional octets necessary
   to represent the value 128 as a big endian unsigned integer.  Since
   encoding 128 requires one octet, the seven remaining bits encode the
   value 1 (0x01).  The second octet encodes the value 128 (0x80).

   Hex dump of the resulting ASN.1 octet string:

   0000000 8180 0001 0203 0405 | 0607 0809 0a0b 0c0d | ................
   0000010 0e0f 1011 1213 1415 | 1617 1819 1a1b 1c1d | ................
   0000020 1e1f 2021 2223 2425 | 2627 2829 2a2b 2c2d | .. !"#$%&'()*+,-
   0000030 2e2f 3031 3233 3435 | 3637 3839 3a3b 3c3d | ./0123456789:;<=
   0000040 3e3f 4041 4243 4445 | 4647 4849 4a4b 4c4d | >?@ABCDEFGHIJKLM
   0000050 4e4f 5051 5253 5455 | 5657 5859 5a5b 5c5d | NOPQRSTUVWXYZ[\]
   0000060 5e5f 6061 6263 6465 | 6667 6869 6a6b 6c6d | ^_`abcdefghijklm
   0000070 6e6f 7071 7273 7475 | 7677 7879 7a7b 7c7d | nopqrstuvwxyz{|}
   0000080 7e7f                |                     | ~.

2.1.4.  Example 4 (length 258)

   The string of octets 0x00 0x01 ... 0xfe 0xff 0x00 0x01 is to be
   converted to an ASN.1 octet string.  The number of octets is 258.
   Since 258 is greater than 128, the first octet of the length prefix
   has the most significant bit set to indicate the length prefix is
   longer than one octet.  The remaining seven bits encode the number of
   additional octets necessary to represent the value 258 as a big
   endian unsigned integer.  Since encoding 258 requires two octets, the
   seven remaining bits encode the value 2 (0x02).  The second and third
   octets encode the value 258 (0x01 0x02).









Morgan & Brunhuber      Expires December 4, 2014                [Page 5]


Internet-Draft                    H2EZ                         June 2014


   Hex dump of the resulting ASN.1 octet string:

   000000 8201 0200 0102 0304 | 0506 0708 090a 0b0c | ................
   000010 0d0e 0f10 1112 1314 | 1516 1718 191a 1b1c | ................
   000020 1d1e 1f20 2122 2324 | 2526 2728 292a 2b2c | ... !"#$%&'()*+,
   000030 2d2e 2f30 3132 3334 | 3536 3738 393a 3b3c | -./0123456789:;<
   000040 3d3e 3f40 4142 4344 | 4546 4748 494a 4b4c | =>?@ABCDEFGHIJKL
   000050 4d4e 4f50 5152 5354 | 5556 5758 595a 5b5c | MNOPQRSTUVWXYZ[\
   000060 5d5e 5f60 6162 6364 | 6566 6768 696a 6b6c | ]^_`abcdefghijkl
   000070 6d6e 6f70 7172 7374 | 7576 7778 797a 7b7c | mnopqrstuvwxyz{|
   000080 7d7e 7f80 8182 8384 | 8586 8788 898a 8b8c | ................
   000090 8d8e 8f90 9192 9394 | 9596 9798 999a 9b9c | ................
   0000a0 9d9e 9fa0 a1a2 a3a4 | a5a6 a7a8 a9aa abac | ................
   0000b0 adae afb0 b1b2 b3b4 | b5b6 b7b8 b9ba bbbc | ................
   0000c0 bdbe bfc0 c1c2 c3c4 | c5c6 c7c8 c9ca cbcc | ................
   0000d0 cdce cfd0 d1d2 d3d4 | d5d6 d7d8 d9da dbdc | ................
   0000e0 ddde dfe0 e1e2 e3e4 | e5e6 e7e8 e9ea ebec | ................
   0000f0 edee eff0 f1f2 f3f4 | f5f6 f7f8 f9fa fbfc | ................
   000100 fdfe ff00 01        |                     | ................

3.  EZFLATE Tokenization Rules

   EZFLATE is itself agnostic to tokenization methods and delimiter
   sets.  Appropriate tokenization rules and delimiter sets depend on
   the type of data to be compressed.  For HTTP/2 header data, each
   field-name and field-value are tokenized separately.

3.1.  Header Name Tokenization

   Since header field names are typically static, it is not beneficial
   to tokenize field names.  Therefore field names are fed to the
   EZFLATE compressor as a single token, including the ASN.1 length
   prefix.

3.2.  Header Value Tokenization

   Field values, on the other hand, have varying values of staticity
   ranging from very static (e.g. :scheme, user-agent) to semi-static
   (e.g. accept, accept-encoding) to very dynamic (e.g. :path, date).
   Field values MAY be tokenized with following set of delimiters:

                 { ' ', '\t', ';', ',' }

                                 Figure 1

   Additionally, the set of headers { ":path", "location", "content-
   location" } MAY instead be tokenized with the following set of
   delimiters:



Morgan & Brunhuber      Expires December 4, 2014                [Page 6]


Internet-Draft                    H2EZ                         June 2014


   Special token delimiters for header values containing a path:

                 { '/', '?', '&', '=' }

                                 Figure 2

   The ASN.1 length prefix MAY be treated as a delimiter and therefore
   treated as a separate token.

   Finally, note that per Section 3.2 of [EZFLATE], a sequence of
   consecutive delimiters are to be treated as a single unit (e.g. as
   one token).  For examle, see Figure 3 and Figure 4, where the
   consecutive delimiter characters ',' and ' ', are treated as a single
   token ", ".

3.3.  Tokenization Example

   The following is an example of converting an HTTP header name-value
   pair into ASN.1 length prefixed octet strings (see Section 2 and then
   tokenizing the octet string based on the tokenization rules.

   The following header is to be encoded:

   accept-encoding: gzip;q=1.0, identity;q=0.5, *;q=0

   First the field-name and field-value are converted to ASN.1 octet
   strings by adding the appropriate length prefixes (see Section 2.1.1
   and Section 2.1.1).

   Hex dump of the header converted to ASN.1 octet strings:

   0000000 0f61 6363 6570 742d | 656e 636f 6469 6e67 | .accept-encoding
   0000010 2167 7a69 703b 713d | 312e 302c 2069 6465 | !gzip;q=1.0, ide
   0000020 6e74 6974 793b 713d | 302e 352c 202a 3b71 | ntity;q=0.5, *;q
   0000030 3d30                |                     | =0

   Next, the length-prefixed header name is taken as a single token.

   Hex dump of the header name token:

   0000000 0f61 6363 6570 742d | 656e 636f 6469 6e67 | .accept-encoding

   Finally, the length-prefixed header value is tokenized according to
   the rules in Section 3.  The following is a hex dump of each of the
   eleven tokens.






Morgan & Brunhuber      Expires December 4, 2014                [Page 7]


Internet-Draft                    H2EZ                         June 2014


   Token 1:

   0000010 2167 7a69 70        |                     | !gzip

   Token 2:

   0000010             3b      |                     |      ;

   Token 3:

   0000010                713d | 312e 30             |       q=1.0

   Token 4:

   0000010                     |        2c 20        |            ,

                                 Figure 3

   Token 5:

   0000010                     |             69 6465 |              ide
   0000020 6e74 6974 79        |                     | ntity

   Token 6:

   0000020             3b      |                     |      ;

   Token 7:

   0000020                713d | 302e 35             |       q=0.5

   Token 8:

   0000020                     |        2c 20        |            ,

                                 Figure 4

   Token 9:

   0000020                     |             2a      |              *

   Token 10:

   0000020                     |                3b   |               ;







Morgan & Brunhuber      Expires December 4, 2014                [Page 8]


Internet-Draft                    H2EZ                         June 2014


   Token 11:

   0000020                     |                  71 |                q
   0000030 3d30                |                     | =0

3.4.  Sub-Tokenization

   The DEFLATE [DEFLATE] format has a fixed maximum length of 258 octets
   for matches (see Section 3.2.5 of [DEFLATE]).  As such, tokens with
   length L greater than 258 octets SHOULD be sub-tokenized into sub-
   tokens as follows.  The number of sub-tokens N MUST be N = (L + 257)
   / 258 (integer division).  If N > 2, the first N - 2 sub-tokens MUST
   be 258 octets in length.  In all cases, the final two sub-tokens MUST
   have length greater than or equal to 129 (258 / 2).  Let R be the
   remaining length for the final two tokens.  The length of the first
   remaining token is L[N - 1] = R - R / 2 and the length of the second
   remaining token is L[N] = R / 2 (where the index is 1..N).
   Alternatively, tokens with length L greater than 258 octets may
   simply be emitted as literals.

   Consider the following examples:

3.4.1.  Sub-Tokenization Example 1 (1292 octet token)

   A token T, 1292 octets in length, is to be sub-tokenized.

   N  = (1292 + 257) / 258 = 1549 / 258 = 6
   L1 = 258
   L2 = 258
   L3 = 258
   L4 = 258
   R  = 1292 - 4 * 258 = 260
   L5 = 260 - 260 / 2  = 130
   L6 = 260 / 2        = 130

   Thus there are six sub-tokens.  The first four sub-tokens are 258
   octets in length.  The final two sub-tokens are 130 octets in length.

3.4.2.  Sub-Tokenization Example 2 (516 octet token)

   A token T, 516 octets in length, is to be sub-tokenized.

   N  = (516 + 257) / 258 = 773 / 258 = 2
   R  = 516
   L1 = R - R / 2 = 258
   L2 = R / 2     = 258

   Thus there are two sub-tokens.  The two sub-tokens are 258 octets in



Morgan & Brunhuber      Expires December 4, 2014                [Page 9]


Internet-Draft                    H2EZ                         June 2014


   length.

3.4.3.  Sub-Tokenization Example 3 (259 octet token)

   A token T, 259 octets in length, is to be sub-tokenized.

   N  = (259 + 257) / 258 = 516 / 258 = 2
   R  = 259
   L1 = R - R / 2 = 130
   L2 = R / 2     = 129

   Thus there are two sub-tokens.  The length of the first sub-token is
   130 octets.  The length of the second sub-token is 129 octets.

4.  Security Considerations

   An EZFLATE compressor can act as an oracle to an attacker probing the
   compression context.  However, the attacker can only find out if the
   guess of the entire token is correct or not.

   Still, an attacker could take advantage of this limited information
   for breaking low-entropy secrets using a brute-force attack.  A
   server usually has some protections against such brute-force attack.
   Here, the attack would target the client, where it would be harder to
   detect.  The attack would be even more dangerous if the attacker is
   able to prevent the traffic generated by its brute-force attack from
   reaching the server.

   To offer protection against such type of attacks, and endpoint MUST
   use non-compressed DEFLATE blocks (see Section 3.2.4 of [DEFLATE]) to
   prevent the compression of any header field whose value contains a
   secret which could be put at risk by a brute-force attack.

5.  Acknowledgements

   Roberto Peon and Herve Ruellan.

6.  References

6.1.  Normative References

   [EZFLATE]   Morgan, K. and C. Brunhuber, "EZFLATE: Token-based
               DEFLATE Compression", draft-morgan-ezflate (work in
               progress), June 2014.

   [X.208-88]  CCITT, "Recommendation X.208: Specification of Abstract
               Syntax Notation One (ASN.1)", January 1998.




Morgan & Brunhuber      Expires December 4, 2014               [Page 10]


Internet-Draft                    H2EZ                         June 2014


   [X.209-88]  CCITT, "Recommendation X.209: Specification of Basic
               Encoding Rules for Abstract Syntax Notation One",
               January 1998.

6.2.  Informative References

   [CRIME]     Rizzo, J. and T. Duong, "The CRIME Attack",
               September 2012, <https://docs.google.com/a/twist.com/
               presentation/d/
               11eBmGiHbYcHR9gL5nDyZChu_-lCa2GizeuOfaLU2HOU/
               edit#slide=id.g1eb6c1b5_3_6>.

   [DEFLATE]   Deutsch, P., "DEFLATE Compressed Data Format
               Specification version 1.3", RFC 1951, May 1996.

   [HTTP-p1]   Fielding, R., Ed. and J. Reschke, Ed., "Hypertext
               Transfer Protocol (HTTP/1.1): Message Syntax and
               Routing", draft-ietf-httpbis-p1-messaging-26 (work in
               progress), February 2014.

   [HTTP2]     Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
               Transfer Protocol version 2", draft-ietf-httpbis-http2-10
               (work in progress), February 2014.

   [SPDY]      Belshe, M. and R. Peon, "SPDY Protocol",
               draft-mbelshe-httpbis-spdy-00 (work in progress),
               February 2012.

Authors' Addresses

   Keith Shearl Morgan
   International Atomic Energy Agency

   EMail: k.morgan@iaea.org


   Christoph Brunhuber
   International Atomic Energy Agency

   EMail: c.brunhuber@iaea.org











Morgan & Brunhuber      Expires December 4, 2014               [Page 11]

Document	Document type	Expired Internet-Draft (individual) Expired & archived
	Select version	00
	Authors	Keith Shearl Morgan, Christoph Brunhuber Email authors
	RFC stream	(None)
	Intended RFC status	(None)
	Other formats	txt xml pdf bibtex bibxml