Network Working Group                                      J. Alakuijala
Internet-Draft                                            E. Kliuchnikov
Intended Status: Informational                                  R. Obryk
Expires: Sep 03, 2018                                        Z. Szabadka
                                                           L. Vandevenne
                                                             Google, Inc
                                                                Mar 2018


                  Shared Brotli Compressed Data Format
                draft-vandevenne-shared-brotli-format-00

Abstract

   This specification defines a data format for shared brotli
   compression, which adds support for dictionaries and extended
   features to brotli [RFC7932].

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF). Note that
   other groups may also distribute working documents as Internet-
   Drafts. The list of current Internet-Drafts is at
   http://datatracker.ietf.org/drafts/current.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on Sep 03, 2018.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 1]


Internet-Draft          Shared Brotli Data Format               Mar 2018


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
      1.1.  Purpose  . . . . . . . . . . . . . . . . . . . . . . . . . 3
      1.2.  Intended audience  . . . . . . . . . . . . . . . . . . . . 3
      1.3.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
      1.4.  Compliance . . . . . . . . . . . . . . . . . . . . . . . . 3
      1.5.  Definitions of terms and conventions used  . . . . . . . . 3
         1.5.1.  Packing into bytes  . . . . . . . . . . . . . . . . . 4
   2.  Shared Brotli Overview  . . . . . . . . . . . . . . . . . . . . 5
   3.  Shared Dictionaries . . . . . . . . . . . . . . . . . . . . . . 5
      3.1.  Custom Static Dictionaries . . . . . . . . . . . . . . . . 6
         3.1.1.  Transform Operations  . . . . . . . . . . . . . . . . 7
      3.2.  LZ77 Dictionaries  . . . . . . . . . . . . . . . . . . . . 8
   4.  Varint Encoding . . . . . . . . . . . . . . . . . . . . . . . . 9
   5.  Shared Dictionary Stream . . . . . . . . . . . . . . . . . . . 10
   6.  Large Window Brotli Compressed Data Stream . . . . . . . . . . 12
   7.  Patching Format Compressed Data Stream . . . . . . . . . . . . 13
   8.  Shared Brotli Compressed Data Stream . . . . . . . . . . . . . 13
   9.  Shared Brotli Framing Format Stream  . . . . . . . . . . . . . 13
      9.1.  Main Format . . . . . . . . . . . . . . . . . . . . . . . 13
      9.2.  Chunk Format  . . . . . . . . . . . . . . . . . . . . . . 14
      9.3.  Metadata Format . . . . . . . . . . . . . . . . . . . . . 16
      9.4.  Chunk Specifications  . . . . . . . . . . . . . . . . . . 16
         9.4.1.  Padding Chunk  . . . . . . . . . . . . . . . . . . . 17
         9.4.2.  Metadata Chunk . . . . . . . . . . . . . . . . . . . 17
         9.4.3.  Data Chunk . . . . . . . . . . . . . . . . . . . . . 17
         9.4.4.  First Partial Data Chunk . . . . . . . . . . . . . . 18
         9.4.5.  Middle Partial Data Chunk  . . . . . . . . . . . . . 18
         9.4.6.  Last Partial Data Chunk  . . . . . . . . . . . . . . 18
         9.4.7.  Footer Metadata Chunk  . . . . . . . . . . . . . . . 19
         9.4.8.  Global Metadata Chunk  . . . . . . . . . . . . . . . 19
         9.4.9.  Repeat Metadata Chunk  . . . . . . . . . . . . . . . 19
         9.4.10.  Central Directory Chunk . . . . . . . . . . . . . . 20
         9.4.11.  Final Footer Chunk  . . . . . . . . . . . . . . . . 21
   10.  Security Considerations . . . . . . . . . . . . . . . . . . . 22
   11.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
   12.  Informative References  . . . . . . . . . . . . . . . . . . . 22
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22












Vandevenne et. al.        Expires Sep 03, 2018                  [Page 2]


Internet-Draft          Shared Brotli Data Format               Mar 2018


1. Introduction

1.1. Purpose

   This document is the authoritative specification of the shared brotli
   dictionary format. It defines the set of valid shared brotli
   dictionary streams and extensions to the brotli decoder algorithm
   ([RFC7932]) that produces the uncompressed data stream from a valid
   brotli compressed data stream and a valid dictionary stream.

1.2. Intended audience

   This specification is intended for use by software implementers to
   compress data into and/or decompress data from the shared brotli
   dictionary format.

   The text of the specification assumes a basic background in
   programming at the level of bits and other primitive data
   representations. Familiarity with the technique of LZ77 coding is
   helpful but not required.

1.3. Scope

   This specification defines a data format for shared brotli
   compression, which adds support for dictionaries and extended
   features to brotli [RFC7932].

1.4. Compliance

   Unless otherwise indicated below, a compliant decompressor must be
   able to accept and decompress any data set that conforms to all the
   specifications presented here. A compliant compressor must produce
   data sets that conform to all the specifications presented here.

1.5. Definitions of terms and conventions used

   Byte: 8 bits stored or transmitted as a unit (same as an octet).  For
   this specification, a byte is exactly 8 bits, even on machines that
   store a character on a number of bits different from eight.  See
   below for the numbering of bits within a byte.

   String: a sequence of arbitrary bytes.

   Bytes stored within a computer do not have a "bit order", since they
   are always treated as a unit. However, a byte considered as an
   integer between 0 and 255 does have a most- and least-significant
   bit, and since we write numbers with the most-significant digit on
   the left, we also write bytes with the most-significant bit on the



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 3]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   left. In the diagrams below, we number the bits of a byte so that bit
   0 is the least-significant bit, i.e., the bits are numbered:

      +--------+
      |76543210|
      +--------+

   Within a computer, a number may occupy multiple bytes. All multi-byte
   numbers in the format described here are unsigned and stored with the
   least-significant byte first (at the lower memory address).  For
   example, the decimal 16-bit number 520 is stored as:

      0        1
      +--------+--------+
      |00001000|00000010|
      +--------+--------+
      ^        ^
      |        |
      |        + more significant byte = 2 x 256
      + less significant byte = 8

1.5.1. Packing into bytes

   This document does not address the issue of the order in which bits
   of a byte are transmitted on a bit-sequential medium, since the final
   data format described here is byte- rather than bit-oriented.
   However, we describe the compressed block format below as a sequence
   of data elements of various bit lengths, not a sequence of bytes. We
   must therefore specify how to pack these data elements into bytes to
   form the final compressed byte sequence:

      * Data elements are packed into bytes in order of
        increasing bit number within the byte, i.e., starting
        with the least-significant bit of the byte.
      * Data elements other than prefix codes are packed
        starting with the least-significant bit of the data
        element. These are referred to here as integer values
        and are considered unsigned.
      * Prefix codes are packed starting with the most-significant
        bit of the code.

   In other words, if one were to print out the compressed data as a
   sequence of bytes, starting with the first byte at the *right* margin
   and proceeding to the *left*, with the most-significant bit of each
   byte on the left as usual, one would be able to parse the result from
   right to left, with fixed-width elements in the correct MSB-to-LSB
   order and prefix codes in bit-reversed order (i.e., with the first
   bit of the code in the relative LSB position).



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 4]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   As an example, consider packing the following data elements into a
   sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2,
   prefix code 110, prefix code 10, 12-bit integer value 3628.

        byte 2   byte 1   byte 0
      +--------+--------+--------+
      |11100010|11000101|10010110|
      +--------+--------+--------+
       ^            ^ ^   ^   ^
       |            | |   |   |
       |            | |   |   +------ integer value 6
       |            | |   +---------- integer value 2
       |            | +-------------- prefix code 110
       |            +---------------- prefix code 10
       +----------------------------- integer value 3628

2. Shared Brotli Overview

   Shared brotli extends brotli [RFC7932] with support for shared
   dictionaries, larger LZ77 window and a framing format.


3. Shared Dictionaries

   A shared dictionary is a piece of data shared by a compressor and
   decompressor. The compressor can take advantage of the dictionary
   context to encode the input in a more compact manner. The compressor
   and the decompressor must use exactly the same dictionary. A shared
   dictionary is specially useful to compress short input sequences.

   A shared brotli dictionary can use two methods of sharing context:

      * An LZ77 dictionary. The encoder and decoder could refer
        to a given sequence of bytes. Multiple LZ77 dictionaries
        can be set.

      * A custom static dictionary: a word list with transforms. The
        encoder and decoder will replace the static dictionary data
        with the data in the shared dictionary. The original static
        dictionary is described in Section 8 in [RFC7932]. The original
        data from Appendix A and Appendix B of [RFC7932] will be
        replaced. In addition, it is possible to dynamically switch
        this dictionary based on the data compression context, and/or
        to include a reference to the original dictionary in the custom
        dictionary.

   If no shared dictionary is set the decoder behaves the same as in
   [RFC7932] on a brotli stream.



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 5]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   If a shared dictionary is set, then it can set any of: LZ77
   dictionaries, overriding static dictionary words, and/or overriding
   transforms.

3.1. Custom Static Dictionaries

   If a custom word list is set, then the following behavior of the RFC
   7932 decoder [RFC7932] is overridden:

      Instead of the Static Dictionary Data from Appendix A
      of [RFC7932], one or more word lists from the custom static
      dictionary data are used.

      Instead of NDBITS at the end of Appendix A, a custom
      SIZE_BITS_BY_LENGTH per custom word list is used.

      The copy length for a static dictionary reference must be
      between 4 and 32 and may not be a value for which
      SIZE_BITS_BY_LENGTH of this dictionary is 0.

   If a custom transforms list is set without context dependency, then
   the following behavior of the RFC 7932 decoder [RFC7932] is
   overridden:

      The "List of Word Transformations" from Appendix B is
      overridden by one or more lists of custom prefixes, suffixes and
      transform operations.

      The transform_id must be smaller than the number of transforms
      given in the custom transforms list.

   If the dictionary is context dependent, it includes a lookup table of
   64 word list and transform list combinations. When resolving a static
   dictionary word, the decoder computes the literal context id, as in
   section 7.1. of [RFC7932]. The literal context id is used as index in
   the lookup tables to select the word list and transforms to use. If
   the dictionary is not context dependent, this id is implicitely 0
   instead.

   If a distance goes beyond the dictionary for the current id and
   multiple word list / transform list combinations are defined, then a
   next dictionary is used in the following order: if not context
   dependent, the same order as defined in the shared dictionary. If
   context dependent, the index matching the current context is used
   first, the same order as defined in the shared dictionary excluding
   the current context are used next.





Vandevenne et. al.        Expires Sep 03, 2018                  [Page 6]


Internet-Draft          Shared Brotli Data Format               Mar 2018


3.1.1. Transform Operations

   A shared dictionary may include custom word transformations, to
   replace those specified in Section 8 and Appendix B of [RFC7932].  A
   transform consists of a possible prefix, a transform operation, for
   some operations a parameter, and a possible suffix. In the shared
   dictionary format, the transform operation is represented by a
   numerical ID, listed in the table below.

     ID     Operation
     --     ---------
      0     Identity
      1     OmitLast1
      2     OmitLast2
      3     OmitLast3
      4     OmitLast4
      5     OmitLast5
      6     OmitLast6
      7     OmitLast7
      8     OmitLast8
      9     OmitLast9
     10     FermentFirst
     11     FermentAll
     12     OmitFirst1
     13     OmitFirst2
     14     OmitFirst3
     15     OmitFirst4
     16     OmitFirst5
     17     OmitFirst6
     18     OmitFirst7
     19     OmitFirst8
     20     OmitFirst9
     21     ShiftFirst (by PARAMETER)
     22     ShiftAll (by PARAMETER)

   Operations 0 to 20 are specified in Section 8 in [RFC7932].
   ShiftFirst and ShiftAll transform specifically encoded SCALARs.

   A SCALAR is a 7-, 11-, 16- or 21-bit unsigned integer encoded with 1,
   2, 3 or 4 bytes respectively with following bit contents:

      7-bit SCALAR:
      +--------+
      |0sssssss|
      +--------+

      11-bit SCALAR:
      +--------+--------+



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 7]


Internet-Draft          Shared Brotli Data Format               Mar 2018


      |110sssss|XXssssss|
      +--------+--------+

      16-bit SCALAR:
      +--------+--------+--------+
      |1110ssss|XXssssss|XXssssss|
      +--------+--------+--------+

      21-bit SCALAR:
      +--------+--------+--------+--------+
      |11110sss|XXssssss|XXssssss|XXssssss|
      +--------+--------+--------+--------+

   Given the input bytes matching SCALAR encoding pattern, the SCALAR
   value is obtained by concatenation of the "s" bits, with the most
   significant bits coming from the earliest byte. The "X" bits could
   have arbitrary value.

   An ADDEND is defined as the result of limited sign extension of
   16-bit unsigned PARAMETER:

      At first the PARAMETER is zero-extended to 32 bits. After this,
      if the resulting value is greater or equal than 0x8000,
      then 0xFF0000 is added.

   ShiftAll starts at the beginning of the word and repetitively applies
   the following transform until the whole word is transformed:

      If the next untransformed byte matches the first byte of the 7-,
      11-, 16- or 21-bit SCALAR pattern, then:

         If the untransformed part of the word is not long enough to
         match the whole SCALAR pattern, then the whole word is
         marked as transformed.

         Otherwise, let SHIFTED be the sum of the ADDEND and the
         encoded SCALAR. The lowest bits from SHIFTED
         are written back into the corresponding "s" bits. The "0",
         "1" and "X" bits remain unchanged. Next, 1, 2, 3 or
         4 not transformed bytes marked as transformed, according to
         the SCALAR pattern length.

      Otherwise, the next untransformed byte is marked as transformed.

   ShiftFirst applies the same transform as ShiftAll, but does not
   iterate.

3.2. LZ77 Dictionaries



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 8]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   If LZ77 dictionaries are set, then the decoder behaves as if the
   bytes of those dictionaries are accessible as the uncompressed bytes
   outside of the regular LZ77 window, for backwards references.

   Let dictionary_length[i] be LZ77_DICTIONARY_LENGTH of the i-th LZ77
   dictionary, where i is in range 0..(NUM_LZ77_DICTIONARIES-1) Let
   dictionary_gap be the sum of all dictionary_length[i] values.  Then
   word_id, described in Section 8 in [RFC7932], is redefined as:

   word_id = distance - (max allowed distance + 1 + dictionary_gap)

   For the case when dictionary_gap is 0, word_id matches the [RFC7932]
   definition.

   distance values of <length, distance> pairs [RFC7932] in range (max
   allowed distance + 1)..(dictionary_gap + max allowed distance) are
   interpreted as references to the contents of LZ77 dictionaries.

   Let dictionary_address be

   dictionary_gap - distance + max allowed distance

   dictionary_address for distances referencing to LZ77 dictionaries
   will be in range 0..(dictionary_gap - 1).

   Let dictionary_offset[0] = 0.  For i in range
   0..(NUM_LZ77_DICTIONARIES-1) dictionary_offset[i+1] =
   dictionary_offset[i] + dictionary_length[i] Then dictionary_address
   in range dictionary_offset[i]..(dictionary_offset[i+1]-1) will
   correspond to bytes 0..(dictionary_length[i]-1) in i-th LZ77
   dictionary.

   If distance references to d-th LZ77 dictionary, but distance - length
   + 1 references to a different dictionary, or regular LZ77 window,
   then <length, distance> [RFC7932] is considered invalid. Otherwise
   bytes, referenced by dictionary..(distance - length + 1) range, are
   copied to the uncompressed stream.

4. Varint Encoding

   A varint is encoded in base 128 in one ore more bytes as follows:

      +--------+--------+             +--------+
      |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx|
      +--------+--------+             +--------+

   where the "x" bits of the first byte are the least significant bits
   of the value and the "x" bits of the last byte are the most



Vandevenne et. al.        Expires Sep 03, 2018                  [Page 9]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   significant bits of the value. The last byte must have its MSB set to
   0, all other bytes to 1 to indicate there is a next byte.

   The maximum allowed amount of bits to read is 63 bits, if the 9th
   byte is present and has its MSB set then the stream must be
   considered as invalid.

5. Shared Dictionary Stream

   The shared dictionary stream encodes a custom dictionary for brotli
   including custom words and/or custom transformations. A shared
   dictionary may appear standalone or as contents of a resource in a
   framing format container.

   A compliant shared brotli dictionary stream must have the following
   format:

      2 bytes: file signature, in hexadecimal the bytes 91, 0.

      varint: LZ77_DICTIONARY_LENGTH, number of bytes for a LZ77
              dictionary, or 0 if there is none.
              The maximum allowed value is the maximum possible sliding
              window size of brotli or of large window brotli.

      LZ77_DICTIONARY_LENGTH bytes: contents of the LZ77 dictionary.

      1 byte: NUM_CUSTOM_WORD_LISTS, may have value 0 to 64

      NUM_CUSTOM_WORD_LISTS times a word list, with the following
              format for each word list:

         28 bytes: SIZE_BITS_BY_LENGTH, array of 28 unsigned 8-bit
                   integers, indexed by word lengths 4 to 31. The value
                   represents log2(number of words of this length),
                   with the exception of 0 meaning 0 words of this
                   length. The max allowed length value is 15 bits.
                   OFFSETS_BY_LENGTH is computed from this as
                   OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] +
                   (SIZE_BITS_BY_LENGTH[i] ? (i <<
                   SIZE_BITS_BY_LENGTH[i]) : 0)

         N bytes: words dictionary data, where N is
                  OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ?
                  (32 << SIZE_BITS_BY_LENGTH[31]) : 0), first all the
                  words of shortest length, then all words of the next
                  length, and so on, where for each length there are
                  either 0 or a positive power of two amount of words.




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 10]


Internet-Draft          Shared Brotli Data Format               Mar 2018


      1 byte: NUM_CUSTOM_TRANSFORM_LISTS, may have value 0 to 64

      NUM_CUSTOM_TRANSFORM_LISTS times a transform list, with the
              following format for each transform list:

            2 bytes: PREFIX_SUFFIX_LENGTH, the length of prefix/suffix
                     data. Must be at least 1 because the list must
                     always end with a zero-length stringlet even
                     if empty.

            NUM_PREFIX_SUFFIX times: prefix/suffix stringlet.
               NUM_PREFIX_SUFFIX is the amount of stringlets parsed and
               must be in range 1..256.

               1 byte: STRING_LENGTH, the length of the entry contents.
                       0 for the last (terminating) entry of the
                       transform list. For other entries STRING_LENGTH
                       must be in range 1..255. The 0 entry must be
                       present and must be the last byte of the
                       PREFIX_SUFFIX_LENGTH bytes of prefix/suffix
                       data, else the stream must be rejected as
                       invalid.

               STRING_LENGTH bytes: contents of the prefix/suffix.

         1 byte: NTRANSFORMS, amount of transformation triplets.

         NTRANSFORMS times: data for each transform:

            1 byte: index of prefix in prefix/suffix data;
                    must be less than NUM_PREFIX_SUFFIX.

            1 byte: index of suffix in prefix/suffix data;
                    must be less than NUM_PREFIX_SUFFIX.

            1 byte: operation index, must be an index in the table of
                    operations listed in the chapter
                    "Transform Operations".

         If and only if at least one transform has operation index
         ShiftFirst or ShiftAll:

            NTRANSFORMS times:

               2 bytes: parameters for the transform. If the transform
                        does not have type ShiftFirst or ShiftAll, the
                        value must be 0. ShiftFirst and ShiftAll
                        interpret these bytes as an unsigned 16-bit



Vandevenne et. al.        Expires Sep 03, 2018                 [Page 11]


Internet-Draft          Shared Brotli Data Format               Mar 2018


                        integer.

      if NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0
         (else implicitly NUM_DICTIONARIES is 1 and points to the
         brotli built-in and there is no context map)

         1 byte: NUM_DICTIONARIES, may have value 1 to 64. Each
                 dictionary is a combination of a word list and a
                 transform list. Each next dictionary is used when the
                 distance goes beyond the previous. If a CONTEXT_MAP is
                 enabled, then the dictionary matching the context is
                 moved to the front in the order for this context.

         NUM_DICTIONARIES times: the DICTIONARY_MAP:

            1 byte: index into a custom word list, or value
                    NUM_CUSTOM_WORD_LISTS to indicate to use the brotli
                    [RFC7932] built-in default word list

            1 byte: index into a custom transform list, or value
                    NUM_CUSTOM_TRANSFORM_LISTS to indicate to use the
                    brotli [RFC7932] built-in default transform list


         1 byte: CONTEXT_ENABLED, if 0 there is no context map, if 1 a
                 context map used to select the dictionary is encoded
                 below

         If CONTEXT_ENABLED is 1, a context map for the 64 brotli
                 [RFC7932] literals contexts:

            64 bytes: CONTEXT_MAP, index into the DICTIONARY_MAP for
                      the first dictionary to use for this context

6. Large Window Brotli Compressed Data Stream

   Large window brotli allows a sliding window beyond the 24-bit maximum
   of regular brotli [RFC7932].

   The compressed data stream is backwards compatible to brotli
   [RFC7932], and may optionally have the following differences:

      Encoding of WBITS in the stream header: the following new
         pattern of 14 bits is supported:

         8 bits: value 00010001, to indicate a large window
                 brotli stream




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 12]


Internet-Draft          Shared Brotli Data Format               Mar 2018


         6 bits: WBITS, must have value in range 10 to 62

      Distance alphabet: if the stream is a large window brotli
         stream, the maximum number of extra bits is 62 and the
         theoretical maximum size of the distance alphabet is
         (16 + NDIRECT + (62 << (NPOSTFIX + 1))). However the
         alphabet is not allowed to contain a distance symbol
         able to represent a distance larger than 9223372036854775804
         when its extra bits have their maximum value.

   A decoder that does not support 64-bit integers may reject a stream
   if WBITS is higher than 30 or a distance symbol from the distance
   alphabet is able to encode a distance larger than 2147483644.

7. Patching Format Compressed Data Stream

   TBD

8. Shared Brotli Compressed Data Stream

   The format of a shared brotli compressed data stream without framing
   format is backwards compatible with brotli [RFC7932], with the
   following optional differences:

      *) LZ77 dictionaries as described above are supported

      *) Custom static dictionaries replacing or extending the static
         dictionary of brotli [RFC7932] with different words or
         transforms are supported

      *) The stream may have the format of regular brotli [RFC7932],
         or the format of large window brotli as described in section
         6, or the format of the patching stream described in
         section 7

9. Shared Brotli Framing Format Stream

   A compliant shared brotli framing format stream has the format
   described below.

9.1. Main Format

     4 bytes: file signature, in hexadecimal the bytes 91, 0a, 42, 52.
              The first byte contains the invalid WBITS combination for
              brotli [RFC7932] and large window brotli.

     1 byte: container flags, 8 bits with meanings:




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 13]


Internet-Draft          Shared Brotli Data Format               Mar 2018


        bit 0 and 1: version indicator, must be 00

        bit 2: if 0, the file contains no final footer, may not contain
           any metadata chunks, may not contain a central directory,
           and may encode only a single resource (using one or more
           data chunks). If 1, the file may contain one or more
           resources, metadata, central directory, and must contain a
           final footer.

     multiple times: a chunk, each with the format specified in section
                     9.2

9.2. Chunk Format

      varint: length of this chunk excluding this varint but
              including all next header bytes and data. If the value
              is 0, then the chunk type byte is not present and the
              chunk type is assumed to be 0.

      1  byte: CHUNK_TYPE
               0: padding chunk
               1: metadata chunk
               2: data chunk
               3: first partial data chunk
               4: middle partial data chunk
               5: last partial data chunk
               6: footer metadata chunk
               7: global metadata chunk
               8: repeat metadata chunk
               9: central directory chunk
               10: final footer

      if CHUNK_TYPE is not padding chunk or final footer:

         1 byte: CODEC:

            0: uncompressed

            1: keep decoder

            2: brotli

            3: shared brotli

      if CODEC is not "uncompressed":

         varint: uncompressed size in bytes of the data contained
                 within the compressed stream



Vandevenne et. al.        Expires Sep 03, 2018                 [Page 14]


Internet-Draft          Shared Brotli Data Format               Mar 2018


      if CODEC is "shared brotli"

         1 byte: amount of dictionary references. Multiple dictionary
                 references are possible with the following
                 restrictions: there can be maximum 1 serialized
                 dictionary, maximum 1 patching file, and maximum 15
                 prefix dictionaries (a serialized dictionary may
                 already contain one of those, and a patching file
                 also takes up a prefix dictionary)

         per dictionary reference:

            1 byte: flags:

               bit 0: if false, internal dictionary reference (by
                      pointer to resource), if true, dictionary
                      reference by hash code (could be internal or
                      external)

               bit 1 and 2: dictionary type:

                  00: prefix dictionary, set in front of the sliding
                      window

                  01: serialized dictionary in the shared brotli
                      format as specified in section 5.

                  10: file to apply patching algorithm to. The
                      compressed stream then has the format specified
                      in section 7.

                  11: invalid bit combination

               bit 3-7: must be 0

            if hash-based: 4+32 bytes "hw3-" and 256-bit Highwayhash
                checksum checksum to refer to dictionary

            if internal index based: varint encoded pointer to its
               entry in this container. The entry must be a data
               chunk.

      X bytes: extra header bytes, depending on CHUNK_TYPE. If present,
               they are specified in the subsequent chapters.

         remaining bytes: the chunk contents. The uncompressed data
                          in the chunk content depends on CHUNK_TYPE
                          and is specified in the subsequent sections.



Vandevenne et. al.        Expires Sep 03, 2018                 [Page 15]


Internet-Draft          Shared Brotli Data Format               Mar 2018


                          The compressed data has following
                          format depending on CODEC:

            *) uncompressed: the raw bytes

            *) if "keep decoder", the continuation of the compressed
               stream which was interrupted at the end of the previous
               chunk. The decoder from the previous chunk must be used
               and its state it had at the end of the previous chunk
               must be kept at the start of the decoding of this chunk.

            *) brotli: the bytes are in brotli format
                       [RFC7932]

            *) shared brotli: the bytes are in the
                              shared brotli format specified in section
                              8

9.3. Metadata Format

   All the metadata chunk types use the following format for the
   uncompressed content:

      Per field:

         2 bytes: code to identify this metadata field. This must be
                  two lowercase or two uppercase alpha ascii
                  characters. If the decoder encounters a lowercase
                  field that it does not recognise for the current
                  chunk type, non-ascii characters or non-alpha
                  characters, the decoder must reject the data stream
                  as invalid. Uppercase codes may be used for custom
                  user metadata and can be ignored by a compliant
                  decoder.

         varint: length of the content of this field in bytes,
                 excluding the code bytes and this varint

         N bytes: the contents of this field

   The last field is reached when the chunk content end is reached. If
   the length of the last field does not end at the same byte as the end
   of the uncompressed content of the chunk, the decoder must reject the
   data stream as invalid.


9.4. Chunk Specifications




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 16]


Internet-Draft          Shared Brotli Data Format               Mar 2018


9.4.1. Padding Chunk

   All bytes in this chunk must be zero, except for the initial varint
   that specifies the remaining chunk length.

   Since the varint itself takes up bytes as well, when the goal is to
   introduce an amount of padding bytes, the dependence of the length of
   the varint on the value it encodes must be taken into account.

   A single byte varint with value 0 is a padding chunk of length 1.
   For more padding, use higher varint values. Do not use multiple
   shorter padding chunks, since this is slower to decode.

9.4.2. Metadata Chunk

   This chunk contains metadata that applies to the resource whose
   beginning is encoded in the subsequent data chunk or first partial
   data chunk.

   The contents of this chunk follows the format described in chapter
   9.3.

   The following field types are recognised:

      id: name field. May appear 0 or 1 times. Has the following
          format:

         N bytes: name in UTF-8 encoding, length determined by the
                  field length. Treated generically but may be used as
                  filename. If used as filename, forward slashes '/'
                  should be used as directory separator, relative paths
                  should be used and filenames ending in a slash with
                  0-length content in the matching data chunk should be
                  treated as an empty directory.

      mt: modification type. May appear 0 or 1 times. Has the following
          format:

         8 bytes: microseconds since epoch, as a little endian signed
                  twos complement 64-bit integer

      custom user field: any two uppercase ASCII characters.

9.4.3. Data Chunk

   A data chunk contains the actual data of a resource.

   This chunk has the following extra header bytes:



Vandevenne et. al.        Expires Sep 03, 2018                 [Page 17]


Internet-Draft          Shared Brotli Data Format               Mar 2018


      1 byte: flags:

        bit 0: if true, indicates this is not a resource that should be
               output implicitly as part of extracting resources from
               this container. Instead, it may be referred to only
               explicitly, e.g. as a dictionary reference by hash code
               or offset. This flag should be set for data used as
               dictionary to improve compression of actual resources.

        bit 1: if true, hash code is given

        bits 2-7: must be zero

      if hash code is given:

        36 bytes: "hw3-" plus 32 bytes highwayhash-256 checkum of the
                  uncompressed data

   The uncompressed content bytes of this chunk are the actual data of
   the resource.

9.4.4. First Partial Data Chunk

   This chunk contains partial data of a resource. This is the first
   chunk in a series containing the entire data of the resource.

   The format of this chunk is the same as the format of a Data Chunk
   (chapter 9.4.3)

   The uncompressed data size and hash code given by this chunk are only
   of this part of the resource, not of the full resource.

9.4.5. Middle Partial Data Chunk

   This chunk contains partial data of a resource, and is neither the
   first nor the last part of the full resource.

   The format of this chunk is the same as the format of a First Data
   Chunk (chapter 9.4.4), except bit 0 of the flags must be 0. Only the
   first chunk specifies the "no output" flag.

9.4.6. Last Partial Data Chunk

   This chunk contains the final piece of partial data of a resource.

   The format is the same as for Partial Data Chunk, Middle (chapter
   9.4.5).




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 18]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   The type of this chunk indicates that there are no further chunk
   encoding this resource, so the full resource is now known.

9.4.7. Footer Metadata Chunk

   This metadata applies to the resource whose encoding ended in the
   preceding data chunk or last partial data chunk.

   The contents of this chunk follows the format described in chapter
   9.3.

   There are no lowercase field types defined for footer metadata.
   Uppercase field types can be used as custom user data.

9.4.8. Global Metadata Chunk

   This metadata applies to the whole container instead of a single
   resource.

   The contents of this chunk follows the format described in chapter
   9.3.

   There are no lowercase field types defined for footer metadata.
   Uppercase field types can be used as custom user data.

9.4.9. Repeat Metadata Chunk

   These chunks optionally repeat metadata that is interleaved between
   data chunks. To use these chunks, it is necessary to also read
   additional information, such as pointers to the original chunks, from
   the central directory.

   The contents of this chunk follows the format described in chapter
   9.3.

   This chunk has an extra header byte:

      1 byte: chunk type of repeated chunk (metadata chunk
              or footer metadata chunk)

   This set of chunks must follow the following restrictions:

      It is optional whether or not repeat metadata chunks are
         present.

      If they are present, then they must be present for all
         metadata chunks and footer metadata chunks.




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 19]


Internet-Draft          Shared Brotli Data Format               Mar 2018


      There may be only 1 repeat metadata chunk per repeated metadata
         chunk.

      They must appear in the same order as the chunks appear in the
         container, which is also the same order as listed in the
         central directory.

   The fields contained in this metadata chunk must follow the following
   restrictions:

      If a field is present, it must
         exactly match the corresponding field of the copied chunk.

      It is allowed to leave out a field that is present
         in the copied chunk.

      If a field is present, then it must be present in *all* other
         repeat metadata chunks when the copied chunk contains this
         field. In other words, if you know you can get the name field
         from a repeat chunk, you know that you will be able to get all
         names of all resources from all repeat chunks.

9.4.10. Central Directory Chunk

   The central directory chunk, along with the repeat metadata chunks,
   allow to quickly find and list compressed resources in the container
   file.

   The contents of the central directory are as follows:

     varint: pointer into the file where the repeat metadata chunks are
             located, or 0 if they are not present

     per chunk listed:

        varint: pointer into the file where this chunk begins

        varint: amount of header bytes N used below

        N bytes: copy of all the header bytes of the pointed at chunk,
                 including total size, chunk type byte, X extra header
                 bytes, codec, dictionary references, uncompressed
                 size. The content is not repeated here.

   The last listed chunk is reached when the end of the contents of the
   central directory are reached. If the end does not match the last
   byte of the central directory, the decoder must reject the data
   stream as invalid.



Vandevenne et. al.        Expires Sep 03, 2018                 [Page 20]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   If present, the central directory must list all chunks of type data,
   first partial data, middle partial data, last partial data, metadata
   and footer metadata.

9.4.11. Final Footer Chunk

   Chunk that closes the file, only present if in the initial container
   header flags bit 2 was set.

   This chunk has the following content, always uncompressed:

      reversed varint: size of this entire framing format file,
                       including these bytes themselves, or 0 if this
                       size is not given

      reversed varint: pointer to the start of the central directory,
                       or 0 if there is none

   A reversed varint has the same format as a varint, but has its bytes
   in reversed order and is designed to be parsed from end of file
   towards the beginning.

   9.4.12. Chunk ordering

   The chunk ordering must follow the rules described below, if the
   decoder sees otherwise, it must reject the data stream as invalid.

      Padding chunks may be inserted anywhere, even between chunks for
      which the rules below say no other chunk types may come in
      between.

      Metadata chunks must come immediately before the Data chunks of
      the resource they apply to.

      Footer metadata chunks must come immediately after the Data
      chunks of the resource they apply to.

      There may be only 0 or 1 metadata chunks per resource.

      There may be only 0 or 1 footer metadata chunks per resource.

      A resource must exist out of either 1 data chunk, or 1 first
      partial data chunk, 0 or more middle partial data
      chunks, and 1 last partial data chunk, in that order.

      Repeat metadata chunks must follow the rules of section 9.4.9.

      There may be only 0 or 1 central directory chunks.



Vandevenne et. al.        Expires Sep 03, 2018                 [Page 21]


Internet-Draft          Shared Brotli Data Format               Mar 2018


      If bit 2 of the container flags is set, there may be only a
      single resource, no metadata chunks of any type, no central
      directory, and no final footer.

      If bit 2 of the container flags is not set, there must be exactly
      1 final footer chunk and it must be the last chunk in the file.

10. Security Considerations

   The security considerations for brotli RFC [7932] apply to shared
   brotli as well.

   In addition, the same considerations apply to the decoding of new
   file format streams for shared brotli, including shared dictionaries,
   the framing format and the shared brotli format.

   When using compression with shared dictionaries, one must be aware
   that attacks changing the dictionary may change the decompressed
   content.

   When using compression with shared dictionaries over an encrypted
   channel, one must be aware that dictionary contents interact with
   compressed data contents through for example the compressed size of
   the contents.


11. IANA Considerations

   The "HTTP Content Coding Registry" has been updated with the
   registration below:

      +-------+-------------------------------------+------------+
      | Name  | Description                         | Reference  |
      +-------+-------------------------------------+------------+
      | sbr   | Shared Brotli Compressed Data Format| RFCXXXX    |
      +-------+-------------------------------------+------------+

12. Informative References

   [RFC7932]  Alakuijala, J., Szabadka, Z., "Brotli Compressed Data
              Format", RFC 7932, Google, Inc., July 2016.
              http://www.ietf.org/rfc/rfc7932.txt

Authors' Addresses

   Jyrki Alakuijala
   Google, Inc.




Vandevenne et. al.        Expires Sep 03, 2018                 [Page 22]


Internet-Draft          Shared Brotli Data Format               Mar 2018


   Email: jyrki@google.com

   Evgenii Kliuchnikov
   Google, Inc.

   Email: eustas@google.com

   Robert Obryk
   Google, Inc.

   Email: robryk@google.com

   Zoltan Szabadka
   Google, Inc.

   Email: szabadka@google.com

   Lode Vandevenne
   Google, Inc.

   Email: lode@google.com






























Vandevenne et. al.        Expires Sep 03, 2018                 [Page 23]