Last Call Review of draft-kucherawy-rfc8478bis-03
review-kucherawy-rfc8478bis-03-secdir-lc-migault-2020-01-16-00

Request Review of draft-kucherawy-rfc8478bis
Requested rev. no specific revision (document currently at 03)
Type Last Call Review
Team Security Area Directorate (secdir)
Deadline 2020-01-17
Requested 2019-12-20
Authors Yann Collet, Murray Kucherawy
Draft last updated 2020-01-16
Completed reviews Secdir Last Call review of -03 by Daniel Migault
Opsdir Last Call review of -03 by Carlos Pignataro
Genart Last Call review of -03 by Joel Halpern
Assignment Reviewer Daniel Migault
State Completed
Review review-kucherawy-rfc8478bis-03-secdir-lc-migault-2020-01-16
Posted at https://mailarchive.ietf.org/arch/msg/secdir/fo7Sq9XoIqJ2zbLVOqE2eyGhZOc
Reviewed rev. 03
Review result Has Nits
Review completed: 2020-01-16

Review
review-kucherawy-rfc8478bis-03-secdir-lc-migault-2020-01-16

Hi, 

I have reviewed this document as part of the security directorate's 
ongoing effort to review all IETF documents being processed by the 
IESG.  These comments were written primarily for the benefit of the 
security area directors.  Document editors and WG chairs should treat 
these comments just like any other last call comments.

The summary of the review is: Has Nits

Please find my comments below
Yours ,
Daniel

[...]

3.1.1.  Zstandard Frames

   The structure of a single Zstandard frame is as follows:

     +--------------------+------------+
     |    Magic_Number    | 4 bytes    |
     +--------------------+------------+
     |    Frame_Header    | 2-14 bytes |
     +--------------------+------------+
     |     Data_Block     | n bytes    |
     +--------------------+------------+
     | [More Data_Blocks] |            |
     +--------------------+------------+
     | [Content_Checksum] | 0-4 bytes  |
     +--------------------+------------+

   Magic_Number:  4 bytes, little-endian format.  Value: 0xFD2FB528.

   Frame_Header:  2 to 14 bytes, detailed in Section 3.1.1.1.

   Data_Block:  Detailed in Section 3.1.1.2.  This is where data
      appears.


<mglt>Given that Content Check sum is mentioned as optional, I believe that its
size can be indicated to 4 bytes. One additional nit, the Frame Header does not
take any value between 2 and 14 but actually a subset of such values. I am not
sure these values should be listed instead. </mglt>

[...]

3.1.1.1.1.1.  Frame_Content_Size_Flag

   This is a 2-bit flag (equivalent to Frame_Header_Descriptor right-
   shifted 6 bits) specifying whether Frame_Content_Size (the
   decompressed data size) is provided within the header.  Flag_Value
   provides FCS_Field_Size, which is the number of bytes used by
   Frame_Content_Size according to the following table:

     +----------------+--------+---+---+---+
     | Flag_Value     |   0    | 1 | 2 | 3 |
     +----------------+--------+---+---+---+
     | FCS_Field_Size | 0 or 1 | 2 | 4 | 8 |
     +----------------+--------+---+---+---+

   When Flag_Value is 0, FCS_Field_Size depends on Single_Segment_Flag:
   If Single_Segment_Flag is set, FCS_Field_Size is 1.  Otherwise,
   FCS_Field_Size is 0; Frame_Content_Size is not provided.

<mglt>Purely editorial, but it seems to me clearer not having to designations
for the same entity, so maybe replacing Flag by Frame_Content_Size_Flag might
be easier to read. In addition, Flag_Value looks like another variable and
might represented as Frame_Content_Size_Flag value instead. Similarly, it might
be clearer replacing FCS_Field_Size by Frame_Content_Size field's length.
</mglt>
  

3.1.1.1.1.2.  Single_Segment_Flag

   If this flag is set, data must be regenerated within a single
   continuous memory segment.

   In this case, Window_Descriptor byte is skipped, but
   Frame_Content_Size is necessarily present.  As a consequence, the
   decoder must allocate a memory segment of size equal or larger than
   Frame_Content_Size.

<mglt>I understand this information as being interpreted by the decompressor to
proceed to the decompression. More specifically, the decompressed files
indicates that such amount of memory will be needed to proceed to the
decompression. I assume that at least the decompressed frame is expected to fit
into value indicated by the Frame_Content_Size. I believe it might be useful
to clearly indicate that mechanism needs to be provided to check while
processing the file, the process does not go beyond the allocated memory, and
if so an error needs to be returned and the decompression aborted. The main
difference with the considerations mentioned below is that the check is not
provided at run time. This is somewhat mentioned in the security consideration,
but it might be beneficial to have this at specific places in the
document.</mglt>   

[...]

3.1.1.1.1.3.  Unused Bit

   A decoder compliant with this specification version shall not
   interpret this bit.  It might be used in a future version, to signal
   a property that is not mandatory to properly decode the frame.  An
   encoder compliant with this specification must set this bit to zero.

<mglt>Probably normative language may be needed here (MUST/MUST NOT). SHOULD be
zero for the decompressor and MUST be set to zero by the compressor. </mglt>

3.1.1.1.1.4.  Reserved Bit

   This bit is reserved for some future feature.  Its value must be
   zero.  A decoder compliant with this specification version must
   ensure it is not set.  This bit may be used in a future revision, to
   signal a feature that must be interpreted to decode the frame
   correctly.

<mglt>Normative language may be appropriated as well.</mglt>

[...]

3.1.1.1.1.6.  Dictionary_ID_Flag

   This is a 2-bit flag (= Frame_Header_Descriptor & 0x3) indicating
   whether a dictionary ID is provided within the header.  It also
   specifies the size of this field as DID_Field_Size:

     +----------------+---+---+---+---+
     | Flag_Value     | 0 | 1 | 2 | 3 |
     +----------------+---+---+---+---+
     | DID_Field_Size | 0 | 1 | 2 | 4 |
     +----------------+---+---+---+---+

<mglt>same comments regarding _Value and FIeld_Size></mglt>

3.1.1.1.2.  Window Descriptor

   This provides guarantees about the minimum memory buffer required to
   decompress a frame.  This information is important for decoders to
   allocate enough memory.

<mglt>Same comment regarding the ability to set boundaries to the memory
allocating. I have the impression that memory allocation can serve two
purposes: storing the Block as well as some context associated to the
processing. I am wondering if that is correct or not. I suspect that if there
is a relation with the Frame Size, it might be restricted to the decompressed
data. </mglt>

   The Window_Descriptor byte is optional.  When Single_Segment_Flag is
   set, Window_Descriptor is not present.  In this case, Window_Size is



Collet & Kucherawy        Expires June 22, 2020                 [Page 9]

Internet-Draft              application/zstd               December 2019


   Frame_Content_Size, which can be any value from 0 to 2^64-1 bytes (16
   ExaBytes).

<mglt>I have the impression that some checks on Windows Size with the Frame
SIze can be performed. If that is correct, I believe these should be
mentioned. It seems that in any cases Window_Size <= Frame_Size. </mglt>

 
     +------------+----------+----------+
     | Bit Number |   7-3    |   2-0    |
     +------------+----------+----------+
     | Field Name | Exponent | Mantissa |
     +------------+----------+----------+

   The minimum memory buffer size is called Window_Size.  It is
   described by the following formulae:

     windowLog = 10 + Exponent;
     windowBase = 1 << windowLog;
     windowAdd = (windowBase / 8) * Mantissa;
     Window_Size = windowBase + windowAdd;

   The minimum Window_Size is 1 KB.  The maximum Window_Size is (1<<41)
   + 7*(1<<38) bytes, which is 3.75 TB.

   In general, larger Window_Size values tend to improve the compression
   ratio, but at the cost of increased memory usage.

   To properly decode compressed data, a decoder will need to allocate a
   buffer of at least Window_Size bytes.

   In order to protect decoders from unreasonable memory requirements, a
   decoder is allowed to reject a compressed frame that requests a
   memory size beyond decoder's authorized range.

   For improved interoperability, it's recommended for decoders to
   support values of Window_Size up to 8 MB and for encoders not to
   generate frames requiring a Window_Size larger than 8 MB.  It's
   merely a recommendation though, and decoders are free to support
   larger or lower limits, depending on local limitations.

<mglt>I think the recommendation is to not use Window_Size larger than 8MB,
which will leave minimal implementation to have a fixed Windows Size of 8MB.
This also means that Windows_Size mechanism is not supported. I would be a bit
reluctant to discourage implementing this mechanism as those implementation may
stay forever, and make any use of larger Windows_Size than 8MB. I am wondering
if the limitation should not result from deployed hardware. In which case we
could write this recommendation as to limit the Windows Size to 8MB.</mglt>

3.1.1.1.3.  Dictionary_ID

   This is a variable size field, which contains the ID of the
   dictionary required to properly decode the frame.  This field is
   optional.  When it's not present, it's up to the decoder to know
   which dictionary to use.

<mglt>I understand the document does not define Dictionary_ID. It might be
worth clarifying this here. </mglt>

   Dictionary_ID field size is provided by DID_Field_Size.
   DID_Field_Size is directly derived from the value of
   Dictionary_ID_Flag.  One byte can represent an ID 0-255; 2 bytes can
   represent an ID 0-65535; 4 bytes can represent an ID 0-4294967295.
   Format is little-endian.



Collet & Kucherawy        Expires June 22, 2020                [Page 10]

Internet-Draft              application/zstd               December 2019


   It is permitted to represent a small ID (for example, 13) with a
   large 4-byte dictionary ID, even if it is less efficient.

   Within private environments, any dictionary ID can be used.  However,
   for frames and dictionaries distributed in public space,
   Dictionary_ID must be attributed carefully.  

<mglt>I think that "carefully" needs to be specified a bit more. Though I
suppose this is defined in the IANA section. I also suppose that low and high
ranges are allocated to specific categories. If these ranges are introduced, it
might worth to specify the purpose of each range. The ranges (high, low) do not
map the possibles sizes (unspecified, 0-255, 256-2**16-1, 2**16-2**32-1). Maybe
ths could be clarified as well. </mglt>

   The following ranges are
   reserved for use only with dictionaries that have been registered
   with IANA (see Section 7.4):
   low range:  <= 32767
   high range:  >= (1 << 31)

   Any other value for Dictionary_ID can be used by private arrangement
   between participants.

   Any payload presented for decompression that references an
   unregistered reserved dictionary ID results in an error.

3.1.1.1.4.  Frame Content Size

   This is the original (uncompressed) size.  This information is
   optional.  Frame_Content_Size uses a variable number of bytes,
   provided by FCS_Field_Size.  FCS_Field_Size is provided by the value
   of Frame_Content_Size_Flag.  FCS_Field_Size can be equal to 0 (not
   present), 1, 2, 4, or 8 bytes.

     +----------------+--------------+
     | FCS Field Size | Range        |
     +----------------+--------------+
     |        0       | unknown      |
     +----------------+--------------+
     |        1       | 0 - 255      |
     +----------------+--------------+
     |        2       | 256 - 65791  |
     +----------------+--------------+
     |        4       | 0 - 2^32 - 1 |
     +----------------+--------------+
     |        8       | 0 - 2^64 - 1 |
     +----------------+--------------+

   Frame_Content_Size format is little-endian.  When FCS_Field_Size is
   1, 4, or 8 bytes, the value is read directly.  When FCS_Field_Size is
   2, the offset of 256 is added.  It's allowed to represent a small
   size (for example 18) using any compatible variant.

<mglt>Probably some checks needs to be enforced and decompressor behavior needs
to be specified. </mglt>

[...]


3.1.1.2.3.  Block_Size

   The upper 21 bits of Block_Header represent the Block_Size.

   When Block_Type is Compressed_Block or Raw_Block, Block_Size is the
   size of Block_Content (hence excluding Block_Header).

   When Block_Type is RLE_Block, since Block_Content's size is always 1,
   Block_Size represents the number of times this byte must be repeated.

   Block_Size is limited by Block_Maximum_Size (see below).

<mglt>I suspect that Block_Size may be verified against the Frame Content Size
?</mglt>

3.1.1.2.4.  Block_Content and Block_Maximum_Size

   The size of Block_Content is limited by Block_Maximum_Size, which is
   the smallest of:

   o  Window_Size

   o  128 KB

<mglt>Given that it was recommended to support windows up to 8MB, I am
wondering why blocks cannot be larger. This is probably ignorance and I apology
for the question. </mglt>  

   Block_Maximum_Size is constant for a given frame.  This maximum is
   applicable to both the decompressed size and the compressed size of
   any block in the frame.

   The reasoning for this limit is that a decoder can read this
   information at the beginning of a frame and use it to allocate
   buffers.  The guarantees on the size of blocks ensure that the
   buffers will be large enough for any following block of the valid
   frame.

3.1.1.3.  Compressed Blocks

   To decompress a compressed block, the compressed size must be
   provided from the Block_Size field within Block_Header.



Collet & Kucherawy        Expires June 22, 2020                [Page 13]

Internet-Draft              application/zstd               December 2019


   A compressed block consists of two sections: a Literals Section
   (Section 3.1.1.3.1) and a Sequences_Section (Section 3.1.1.3.2).  The
   results of the two sections are then combined to produce the
   decompressed data in Sequence Execution (Section 3.1.1.4).

   To decode a compressed block, the following elements are necessary:

   o  Previous decoded data, up to a distance of Window_Size, or the
      beginning of the Frame, whichever is smaller.  Single_Segment_Flag
      will be set in the latter case.

   o  List of "recent offsets" from the previous Compressed_Block.

<mglt>This is a general comment when execution is based on computed length,
size, or involve offsets. I also susect that reading backward might result in a
few additional risks. It would be good to specify the boundaries associated to
these values to avoid buffer overflow. If such check ar not possible, it would
be good also to raise the concern that the implementation needs to carefully
handle that operation. Now that have been through the Security Consideration
Section, I can see this has been done for one variable, but I am not sure
others variable defined in the doc are not subject to the same type of
attacks.</mglt>

   o  The previous Huffman tree, required by Treeless_Literals_Block
      type.

   o  Previous Finite State Entropy (FSE) decoding tables, required by
      Repeat_Mode, for each symbol type (literals lengths, match
      lengths, offsets).

   Note that decoding tables are not always from the previous
   Compressed_Block:

   o  Every decoding table can come from a dictionary.

   o  The Huffman tree comes from the previous
      Compressed_Literals_Block.

<mglt>My knowledge in compression are too weak to comment, but my understanding
is that decompression relies on decoding tables. Which means that they can
influence the processing either via the dictionary, a specifically crafted
compressed block,.... I am wondering how these tables can influence the
execution:
* does different tables end in different usage of resources? I think no.  

If the table ends in different outputs, this might be used to compress Content1
and decompress it in Content 2. ? If that is possible, I believe the security
consideration should recommend that Content signature MUST always be provided
on the decompressed content, and not the compressed content.  

The same considerations are of course valid for anything that influence the
compression/decompression.  </mglt>


[...]

3.1.1.3.1.6.  Jump_Table

   The Jump_Table is only present when there are 4 Huffman-coded
   streams.

   (Reminder: Huffman-compressed data consists of either 1 or 4 Huffman-
   coded streams.)

   If only 1 stream is present, it is a single bitstream occupying the
   entire remaining portion of the literals block, encoded as described
   within Section 4.2.2.

   If there are 4 streams, Literals_Section_Header only provides enough
   information to know the decompressed and compressed sizes of all 4
   streams combined.  The decompressed size of each stream is equal to
   (Regenerated_Size+3)/4, except for the last stream, which may be up
   to 3 bytes smaller, to reach a total decompressed size as specified
   in Regenerated_Size.

   The compressed size of each stream is provided explicitly in the
   Jump_Table.  The Jump_Table is 6 bytes long and consists of three
   2-byte little-endian fields, describing the compressed sizes of the
   first 3 streams.  Stream4_Size is computed from Total_Streams_Size
   minus sizes of other streams.

     Stream4_Size = Total_Streams_Size - 6
                    - Stream1_Size - Stream2_Size
                    - Stream3_Size

   Note that if Stream1_Size + Stream2_Size + Stream3_Size exceeds
   Total_Streams_Size, the data are considered corrupted.

<mglt>I would maybe expect that implementation MUST check this.</mglt>

   Each of these 4 bitstreams is then decoded independently as a
   Huffman-Coded stream, as described in Section 4.2.2.

[...]

4.1.1.  FSE Table Description

   To decode FSE streams, it is necessary to construct the decoding
   table.  The Zstandard format encodes FSE table descriptions as
   described here.

   An FSE distribution table describes the probabilities of all symbols
   from 0 to the last present one (included) on a normalized scale of
   (1 << Accuracy_Log).  Note that there must be two or more symbols
   with non-zero probability.


   A bitstream is read forward, in little-endian fashion.  It is not
   necessary to know its exact size, since the size will be discovered
   and reported by the decoding process.  The bitstream starts by
   reporting on which scale it operates.  If low4bits designates the
   lowest 4 bits of the first byte, then Accuracy_Log = low4bits + 5.

<mglt>This looks like processing until the end to determine the size. If that
is correct, it seems dangerous unless some measurements are take to prevent
resource exhaustion. </mglt>

[...]

4.2.  Huffman Coding

   Zstandard Huffman-coded streams are read backwards, similar to the
   FSE bitstreams.  Therefore, to find the start of the bitstream, it is
   necessary to know the offset of the last byte of the Huffman-coded
   stream.

   After writing the last bit containing information, the compressor
   writes a single 1 bit and then fills the byte with 0-7 0 bits of
   padding.  The last byte of the compressed bitstream cannot be 0 for
   that reason.

<mglt> fills the byte with 0-7 0 bits of... I have hard time to parse the
sentence. I understand it as fills the byte by setting bits 0-7 to 0.</mglt>

[...]

7.  IANA Considerations

<mglt>Given that the documents defines Dictionnaries, I am wondering if a
registry shoudl not be created. </mglt>
[...]

7.4.  Dictionaries

   Work in progress includes development of dictionaries that will
   optimize compression and decompression of particular types of data.
   Specification of such dictionaries for public use will necessitate
   registration of a code point from the reserved range described in
   Section 3.1.1.1.3 and its association with a specific dictionary.

   However, there are at present no such dictionaries published for
   public use, so this document makes no immediate request of IANA to
   create such a registry.

<mglt>Dictionnaries will probably be defined later through a standard process,
I believe that is worth being mentioned. In addition, we may also insist on
specifying the impact on resource as well as the resulting output need to be
closely looked at. The problem being essentially if decompres(f, dict1) and
decompress(f, dict2) provides different outputs and this property can be used
by an attacker to transform the uncompressed file. </mglt> 

8.  Security Considerations

   Any data compression method involves the reduction of redundancy in
   the data.  Zstandard is no exception, and the usual precautions
   apply.

   One should never compress a message whose content must remain secret
   with a message generated by a third party.  Such a compression can be
   used to guess the content of the secret message through analysis of
   entropy reduction.  This was demonstrated in the Compression Ratio
   Info-leak Made Easy (CRIME) attack [CRIME], for example.

<mglt>When uncompressed data needs to be protected, it might be worth
clarifying the differences associated to protecting (authenticating) the
compressed data as opposed to the uncompressed data. I suspect if might be
recommended to authenticate uncompressed data. </mglt>

   A decoder has to demonstrate capabilities to detect and prevent any
   kind of data tampering in the compressed frame from triggering system
   faults, such as reading or writing beyond allowed memory ranges.
   This can be guaranteed by either the implementation language or
   careful bound checkings.  Of particular note is the encoding of
   Number_of_Sequences values that cause the decoder to read into the
   block header (and beyond), as well as the indication of a
   Frame_Content_Size that is smaller than the actual decompressed data,
   in an attempt to trigger a buffer overflow.  It is highly recommended
   to fuzz-test (i.e., provide invalid, unexpected, or random input and
   verify safe operation of) decoder implementations to test and harden



Collet & Kucherawy        Expires June 22, 2020                [Page 42]

Internet-Draft              application/zstd               December 2019


   their capability to detect bad frames and deal with them without any
   adverse system side effect.

   An attacker may provide correctly formed compressed frames with
   unreasonable memory requirements.  A decoder must always control
   memory requirements and enforce some (system-specific) limits in
   order to protect memory usage from such scenarios.

   Compression can be optimized by training a dictionary on a variety of
   related content payloads.  This dictionary must then be available at
   the decoder for decompression of the payload to be possible.  While
   this document does not specify how to acquire a dictionary for a
   given compressed payload, it is worth noting that third-party
   dictionaries may interact unexpectedly with a decoder, leading to
   possible memory or other resource exhaustion attacks.  We expect such
   topics to be discussed in further detail in the Security
   Considerations section of a forthcoming RFC for dictionary
   acquisition and transmission, but highlight this issue now out of an
   abundance of caution.

   As discussed in Section 3.1.2, it is possible to store arbitrary user
   metadata in skippable frames.  While such frames are ignored during
   decompression of the data, they can be used as a watermark to track
   the path of the compressed payload.