Skip to main content

Zstandard Compression and the application/zstd Media Type
draft-kucherawy-dispatch-zstd-03

Revision differences

Document history

Date Rev. By Action
2018-10-02
03 (System) RFC Editor state changed to AUTH48-DONE from AUTH48
2018-09-19
03 (System) RFC Editor state changed to AUTH48 from RFC-EDITOR
2018-09-07
03 (System) RFC Editor state changed to RFC-EDITOR from EDIT
2018-07-17
03 (System) IANA Action state changed to RFC-Ed-Ack from Waiting on RFC Editor
2018-07-17
03 (System) IANA Action state changed to Waiting on RFC Editor from In Progress
2018-07-17
03 (System) IANA Action state changed to In Progress from Waiting on Authors
2018-07-16
03 (System) RFC Editor state changed to EDIT
2018-07-16
03 (System) IESG state changed to RFC Ed Queue from Approved-announcement sent
2018-07-16
03 (System) Announcement was received by RFC Editor
2018-07-16
03 (System) IANA Action state changed to Waiting on Authors from In Progress
2018-07-16
03 (System) IANA Action state changed to In Progress
2018-07-16
03 Cindy Morgan IESG state changed to Approved-announcement sent from Approved-announcement to be sent
2018-07-16
03 Cindy Morgan IESG has approved the document
2018-07-16
03 Cindy Morgan Closed "Approve" ballot
2018-07-16
03 Cindy Morgan Ballot approval text was generated
2018-07-16
03 Alexey Melnikov IESG state changed to Approved-announcement to be sent from IESG Evaluation::AD Followup
2018-07-15
03 Murray Kucherawy New version available: draft-kucherawy-dispatch-zstd-03.txt
2018-07-15
03 (System) New version approved
2018-07-15
03 (System) Request for posting confirmation emailed to previous authors: Yann Collet , Murray Kucherawy
2018-07-15
03 Murray Kucherawy Uploaded new revision
2018-07-13
02 Alexey Melnikov [Ballot comment]
Checking a few minor points with authors before approval.
2018-07-13
02 Alexey Melnikov Ballot comment text updated for Alexey Melnikov
2018-07-12
02 Benjamin Kaduk
[Ballot comment]
Thanks for addressing my DISCUSS (and pointing out that my other point was not grounded in fact)!
I think I had intended to …
[Ballot comment]
Thanks for addressing my DISCUSS (and pointing out that my other point was not grounded in fact)!
I think I had intended to switch to Yes, but since I was so slow to respond to the author comments,
I have forgotten enough about the document that I will just go with No Objection instead, to avoid
any further delay.

Original comments preserved below:

Some high-level comments:

After reading Section 2.4.2.1 (and subsections), I'm not sure I
fully understand the procedures for the Huffman coding.  Granted,
in order to obtain the efficient compression ratios the procedure is
necessarily complicated, so arguably there is some onus on me as the
reader to work harder to understand.  Having said that, I think the
mapping from weights to prefixes in the prefix code makes sense to
me, but I'm not sure I understand how the weights are assigned to
their respective symbols/literals (i.e., what the prefix decodes
to).  My current reading of the text is that weights are given for
literals 0 through (last literal - 1), and in particular if I want
to do raw (non-FSE) weights, I can only do 128 literals like this.
So, are the symbols/literals just these values from 0 to 127?  That
seems unlikely to be correct, given that we want to compress data
containing bytes with the high bit set, but I'm unsure where I'm
going astray.

It seems like a link to some dedicated/official test vectors seems
like it would be useful, if test vectors themselves are seen as
being too bulky to include in the document.


The text about third-party dictionaries in Section 4 makes me wonder
if it should explicitly be stated that "dictionary input to
decompression should be treated as being similarly untrusted as
input compressed frames, and the appropriate bounds checking
performed on data accesses as needed".

In Appendix B, I don't see a reference to match up to a chapter
"from normalized distribution to decoding tables".


Some more minor section-by-section comments follow.


Section 2.1.1

Should the frame-structure diagram list "0 or 4 bytes" for the
content checksum?

  Content_Checksum:  An optional 32-bit checksum, only present if the
      Content_Checksum_flag is set.  The content checksum is the result
      of the xxh64() hash function [XXHASH] digesting the origina

"original"

Section 2.1.1.1.1.2

  In this case, Window_Descriptor byte is skipped, but

"the Window_Descriptor byte"

What's the difference between "Unused" and "Reserved", viz.
2.1.1.1.1.3 and 2.1.1.1.1.4?

Section 2.1.1.1.1.6.

  This is a two-bit flag (= FHD & 3)

two-bit flag (whose value is obtained by masking the frame header
descriptor with 0x3)

Section 2.1.1.1.2.

  Provides guarantees on minimum memory buffer required to decompress a
  frame. [...]

"the minimum memory buffer"


Section 2.1.1.1.3.

  Field size depends on Dictionary_ID_flag. [...]

"The field size depends on the"

How are dictionary (ID)s allocated and standardized?

Section 2.1.1.2.2.

  RLE_Block:  This is a single byte, repeated Block_Size times.
      Block_Content consists of a single byte.  On the decompression
      side, this byte must be repeated Block_Size times.

I think we need normative language on "the decoder MUST verify
[...]".

Section 2.1.1.3

This "all previously decoded data" is only within a frame, right?
It might be worth reiterating that here.

Section 2.1.1.3.1.1.

For Size_Format for Raw_Literals_Block and RLE_Literals_Block, I
think we need to explicitly say "when parsing the flags bit-by-bit,
if the low-order bit of the Size_Format field is zero, the field is
only one bit, and processing proceeds to the next bitfield.
Regenerated_Size uses [...]"

Section 2.1.1.3.2.1.

Please expand FSE on first use.

Section 2.4.1.1

  Finally, the decoder can tell how many bytes were used in this
  process, and how many symbols are present.  The bitstream consumes a
  round number of bytes.  Any remaining bit within the last byte is
  simply unused.

Usually we say "set to zero on encode, ignore on decode" to make
things deterministic.

Section 2.4.2

  When decompressing, the last byte containing the padding is the first
  byte to read.  The decompressor needs to skip 0-7 initial 0-bits and
  the first 1-bit lt occurs.  Afterwards, the useful part of the
  bitstream begins.

I guess "lt" is supposed to be "that"?

Section 2.4.2.1

  The tree depth is 4, since its smallest element uses 4 bits.

It's not entirely clear to me what "smallest" means here --
presumably "lowest in the tree", though that is rather tautological
in this context.  "Smallest weight" would make sense, except there
is not a weight column in the table.

Section 2.4.2.1.3

    if Number_of_Bits != 0
        Number_of_Bits = Max_Number_of_Bits + 1 - Weight

Should the first Number_of_Bits be Weight?
2018-07-12
02 Benjamin Kaduk [Ballot Position Update] Position for Benjamin Kaduk has been changed to No Objection from Discuss
2018-07-11
02 Adam Roach [Ballot comment]
Thanks for addressing my discuss and comments.
2018-07-11
02 Adam Roach [Ballot Position Update] Position for Adam Roach has been changed to No Objection from Discuss
2018-07-11
02 (System) Sub state has been changed to AD Followup from Revised ID Needed
2018-07-11
02 (System) IANA Review state changed to Version Changed - Review Needed from IANA OK - Actions Needed
2018-07-11
02 Cindy Morgan New version available: draft-kucherawy-dispatch-zstd-02.txt
2018-07-11
02 (System) Secretariat manually posting. Approvals already received
2018-07-11
02 Cindy Morgan Uploaded new revision
2018-05-24
01 Cindy Morgan IESG state changed to IESG Evaluation::Revised I-D Needed from IESG Evaluation
2018-05-24
01 Ignas Bagdonas [Ballot comment]
No objection as in "have read the document but it is not in the domain of expertise that I can authoritatively comment on".
2018-05-24
01 Ignas Bagdonas [Ballot Position Update] New position, No Objection, has been recorded for Ignas Bagdonas
2018-05-24
01 Benjamin Kaduk
[Ballot discuss]
I support Adam's DISCUSS.

Additionally, I think that there are significant privacy
considerations associated with the Skippable Frames described in
Section 2.3, that …
[Ballot discuss]
I support Adam's DISCUSS.

Additionally, I think that there are significant privacy
considerations associated with the Skippable Frames described in
Section 2.3, that should be documented before this document
advances.  Specifically, this provides an easy way for a party (not
even necessarily the encoder, since these frames can be inserted
independently from the actual compression scheme) to insert (e.g.)
tracking data into a compressed stream and have it ignored by
standard decoders.  There are myriad possibilities for how this
could be used, such as for watermarking files with information about
how they were downloaded/generated/etc., which could be used for
tracking leaks from confidential materials or illegal distribution
of copyrighted content; there is potential for personally
identifying information to be included; the list goes on.  I can see
that there can also be useful ways to use these frames to introduce
additional metadata about the compressed content, but fear that we
may want to give guidance for these frames to be
stripped/forbidden/etc. absent additional context to indicate that
the information in the skippable frame is non-malicious.

A more minor note, but still IMO blocking -- in Section 2.1.1.1.2:

    windowLog = 10 + Exponent;
    windowBase = 1 << windowLog;
    windowAdd = (windowBase / 8) * Mantissa;
    Window_Size = windowBase + windowAdd;

I don't think this formula is correct -- windowAdd in this formula
is not modified by windowLog at all, which does not match up with
the stated maxiumum bound in the body text.
2018-05-24
01 Benjamin Kaduk
[Ballot comment]
Some high-level comments:

After reading Section 2.4.2.1 (and subsections), I'm not sure I
fully understand the procedures for the Huffman coding.  Granted,
in …
[Ballot comment]
Some high-level comments:

After reading Section 2.4.2.1 (and subsections), I'm not sure I
fully understand the procedures for the Huffman coding.  Granted,
in order to obtain the efficient compression ratios the procedure is
necessarily complicated, so arguably there is some onus on me as the
reader to work harder to understand.  Having said that, I think the
mapping from weights to prefixes in the prefix code makes sense to
me, but I'm not sure I understand how the weights are assigned to
their respective symbols/literals (i.e., what the prefix decodes
to).  My current reading of the text is that weights are given for
literals 0 through (last literal - 1), and in particular if I want
to do raw (non-FSE) weights, I can only do 128 literals like this.
So, are the symbols/literals just these values from 0 to 127?  That
seems unlikely to be correct, given that we want to compress data
containing bytes with the high bit set, but I'm unsure where I'm
going astray.

It seems like a link to some dedicated/official test vectors seems
like it would be useful, if test vectors themselves are seen as
being too bulky to include in the document.


The text about third-party dictionaries in Section 4 makes me wonder
if it should explicitly be stated that "dictionary input to
decompression should be treated as being similarly untrusted as
input compressed frames, and the appropriate bounds checking
performed on data accesses as needed".

In Appendix B, I don't see a reference to match up to a chapter
"from normalized distribution to decoding tables".


Some more minor section-by-section comments follow.


Section 2.1.1

Should the frame-structure diagram list "0 or 4 bytes" for the
content checksum?

  Content_Checksum:  An optional 32-bit checksum, only present if the
      Content_Checksum_flag is set.  The content checksum is the result
      of the xxh64() hash function [XXHASH] digesting the origina

"original"

Section 2.1.1.1.1.2

  In this case, Window_Descriptor byte is skipped, but

"the Window_Descriptor byte"

What's the difference between "Unused" and "Reserved", viz.
2.1.1.1.1.3 and 2.1.1.1.1.4?

Section 2.1.1.1.1.6.

  This is a two-bit flag (= FHD & 3)

two-bit flag (whose value is obtained by masking the frame header
descriptor with 0x3)

Section 2.1.1.1.2.

  Provides guarantees on minimum memory buffer required to decompress a
  frame. [...]

"the minimum memory buffer"


Section 2.1.1.1.3.

  Field size depends on Dictionary_ID_flag. [...]

"The field size depends on the"

How are dictionary (ID)s allocated and standardized?

Section 2.1.1.2.2.

  RLE_Block:  This is a single byte, repeated Block_Size times.
      Block_Content consists of a single byte.  On the decompression
      side, this byte must be repeated Block_Size times.

I think we need normative language on "the decoder MUST verify
[...]".

Section 2.1.1.3

This "all previously decoded data" is only within a frame, right?
It might be worth reiterating that here.

Section 2.1.1.3.1.1.

For Size_Format for Raw_Literals_Block and RLE_Literals_Block, I
think we need to explicitly say "when parsing the flags bit-by-bit,
if the low-order bit of the Size_Format field is zero, the field is
only one bit, and processing proceeds to the next bitfield.
Regenerated_Size uses [...]"

Section 2.1.1.3.2.1.

Please expand FSE on first use.

Section 2.4.1.1

  Finally, the decoder can tell how many bytes were used in this
  process, and how many symbols are present.  The bitstream consumes a
  round number of bytes.  Any remaining bit within the last byte is
  simply unused.

Usually we say "set to zero on encode, ignore on decode" to make
things deterministic.

Section 2.4.2

  When decompressing, the last byte containing the padding is the first
  byte to read.  The decompressor needs to skip 0-7 initial 0-bits and
  the first 1-bit lt occurs.  Afterwards, the useful part of the
  bitstream begins.

I guess "lt" is supposed to be "that"?

Section 2.4.2.1

  The tree depth is 4, since its smallest element uses 4 bits.

It's not entirely clear to me what "smallest" means here --
presumably "lowest in the tree", though that is rather tautological
in this context.  "Smallest weight" would make sense, except there
is not a weight column in the table.

Section 2.4.2.1.3

    if Number_of_Bits != 0
        Number_of_Bits = Max_Number_of_Bits + 1 - Weight

Should the first Number_of_Bits be Weight?
2018-05-24
01 Benjamin Kaduk [Ballot Position Update] New position, Discuss, has been recorded for Benjamin Kaduk
2018-05-24
01 Alvaro Retana
[Ballot comment]
I share Adam's concern and support his DISCUSS.

FWIW, I would also be happy with a clarification along the lines of what Adam …
[Ballot comment]
I share Adam's concern and support his DISCUSS.

FWIW, I would also be happy with a clarification along the lines of what Adam suggested.
2018-05-24
01 Alvaro Retana Ballot comment text updated for Alvaro Retana
2018-05-24
01 Martin Vigoureux [Ballot Position Update] New position, No Objection, has been recorded for Martin Vigoureux
2018-05-23
01 Terry Manderson [Ballot Position Update] New position, No Objection, has been recorded for Terry Manderson
2018-05-23
01 Ben Campbell [Ballot Position Update] Position for Ben Campbell has been changed to No Objection from No Record
2018-05-23
01 Ben Campbell
[Ballot comment]
I agree with Adam's discuss points, although it was not clear to me if public dictionaries are expected.

Others have already pointed out …
[Ballot comment]
I agree with Adam's discuss points, although it was not clear to me if public dictionaries are expected.

Others have already pointed out the document says "standards track". Otherwise I have a few mostly editorial comments:

§2.1 - s/indepedently/independently
§2.1.1, last paragraph -  s/origina/original
§2.1.1.1.1.3  "The value of this bit should be set to zero"- I know this isn't using 2119 language--but is the plain-English meaning of "should" what you had in mind? Also, why is this stated differently from §2.1.1.1.1.4? (Also also, what's the IETF record for section nesting depth?)
§2.1.1.2.2, reserved -  The text says "this value cannot be used with the current specification". What if you get a block that used it?
§2.1.1.2.3 : "always strictly less than" - Is this really true? There's no possible input (e.g. already compressed text) where the decompressed and compressed sizes are equal?

§4:
- "Usual precautions"- does that refer to the subsequent paragraphs, or something else?
- Any chance of a citation for "fuzz-test"?
- Last paragraph, last sentence: Forward looking predictions in RFCs have a habit of becoming dated, one way or another.

§6.2 - Are none of these references properly normative?
2018-05-23
01 Ben Campbell Ballot comment text updated for Ben Campbell
2018-05-23
01 Alvaro Retana
[Ballot comment]
I share Adam's concern and support his DISCUSS.

FWIW, I would also be happy with a clarification along the lines of what Adan …
[Ballot comment]
I share Adam's concern and support his DISCUSS.

FWIW, I would also be happy with a clarification along the lines of what Adan suggested.
2018-05-23
01 Alvaro Retana [Ballot Position Update] New position, No Objection, has been recorded for Alvaro Retana
2018-05-23
01 Alissa Cooper [Ballot Position Update] New position, No Objection, has been recorded for Alissa Cooper
2018-05-22
01 Suresh Krishnan [Ballot Position Update] New position, No Objection, has been recorded for Suresh Krishnan
2018-05-22
01 Adam Roach
[Ballot discuss]
Thanks for taking the time to document this format for public consumption. I
have a handful of blocking concerns (although I'm open to …
[Ballot discuss]
Thanks for taking the time to document this format for public consumption. I
have a handful of blocking concerns (although I'm open to listening to reasons
that I might be wrong on this front), and a number of additional comments.

---------------------------------------------------------------------------

I have a lot of heartburn around the publication of an in informational
document of a protocol called "Zstandard." I know the protocol has been in
development for a while, and has non-trivial deployment, so I understand that
there would be reluctance to change its name at this point.

If we leave the name as-is, I do not think that the normal informational
boilerplate is sufficient. I would like to see additional text that explicitly
addresses the situation, along the lines of:

[Abstract]

  Zstandard, or "zstd" (pronounced "zee standard"), is a data
  compression mechanism.  This document describes the mechanism, and
  registers a media type to be used when transporting zstd-compressed
  via Multipurpose Internet Mail Extensions (MIME).  Despite the use of the
  word "standard" as part of its name, readers are advised that this document
  is not an Internet Standards Track specification, and is being published
  for informational purposes only.

[Introduction]

  Zstandard, or "zstd" (pronounced "zee standard") is a data compression
  mechanism, akin to gzip [RFC1952]. Despite the use of the word "standard"
  as part of its name, readers are advised that this document is not an
  Internet Standards Track specification, and is being published for
  informational purposes only.

---------------------------------------------------------------------------

§2.2.1:

>  For the first block, the starting offset history is populated with
>  the following values : 1, 4 and 8 (in order).

I fear this is ambiguously specified. I can interpret this as either temporal
order:

Repeated_Offset1 = 8
Repeated_Offset2 = 4
Repeated_Offset3 = 1

Or as sequential order:

Repeated_Offset1 = 1
Repeated_Offset2 = 4
Repeated_Offset3 = 8

Please clarify, as this confusion can lead to incompatible implementations.

---------------------------------------------------------------------------

The dictionary scheme in here seems problematic, in that the intention is
clearly to have public, well-known dictionaries; and the dictionaries are
intended to have globally-unique identifiers for that purpose. 31 bits isn't
enough space to achieve uniqueness through randomness. While there are other
approaches that involve things like dictionary IDs that are hashes of their
contents (see, e.g., SigComp), I suspect the notion of expanding the size of
this field isn't very appealing.

If you keep the format the same (4 bytes), I don't see how the dictionary part
of this scheme can be interoperable without a registry of some kind. Even if the
intention is to publish further documents on the topic of dictionaries, I
believe publication of this document needs to wait on establishment of such a
registry. I have no opinion about whether this is resolved by creating the
registry in this document, or in holding its publication until the document that
does create such a registry is published.
2018-05-22
01 Adam Roach
[Ballot comment]
General:

The document uses the phrase "natural order" in several places without defining
it. I can make a guess about what is intended, …
[Ballot comment]
General:

The document uses the phrase "natural order" in several places without defining
it. I can make a guess about what is intended, but I'm not completely confident.
Adding a definition for this term would be very helpful.

---------------------------------------------------------------------------

§2.1.1:

>    of the xxh64() hash function [XXHASH] digesting the origina

Typo: "original"

---------------------------------------------------------------------------

§2.1.1.1.1.1.

>  This is a two-bit flag (equivalent to Frame_Header_Descriptor left-
>  shifted six bits)

Shouldn't this say "right-shifted"?

---------------------------------------------------------------------------

§2.1.1.3:

>  To decode a compressed block, the following elements are necessary:
>
>  o  Previous decoded data, up to a distance of Window_Size, or all
>    previously decoded data when Single_Segment_flag is set.

To be clear, this is "up to a distance of Window_Size or to the beginning of
the Frame, whichever is smaller," right? I believe the intention is that you
can't use data from the previous frame to encode this one, and the text should
probably take care to avoid any implication to the contrary.

---------------------------------------------------------------------------

§2.1.1.3.1:

>  Literals can be stored uncompressed or compressed using Huffman
>  prefix codes.  When compressed, an optional tree description can be
>  present, followed by one or four streams.

A brief description right here of the concept of a "stream" would be quite
helpful in understanding the following several sections.

---------------------------------------------------------------------------

§2.1.1.3.1.1:

>  Value ?0:  Size_Format uses one bit.  Regenerated_Size uses five bits

Please either define the meaning of "?" here, or explicitly call out "Values 00
and 10:"

---------------------------------------------------------------------------

§2.1.1.3.2.1:

>  o  if (byte0 < 255): Number_of_Sequences = ((byte0-128) << 8) +
>    byte1.  Uses 2 bytes.

Please change to "if (127 < byte0 < 255):"


---------------------------------------------------------------------------

§2.1.1.3.2.1:

>  Predefined_Mode:  A predefined FSE distribution table is used,
>    defined in Section 2.1.1.3.2.2.  No distribution table will be
>    present.

I see that "FSE" is expanded in section 2.4.1. Please expand it here, or
provide a reference to 2.4.1 here.

---------------------------------------------------------------------------

§2.1.1.3.2.1:

The table of compression modes lists the modes in the order "Predefined, RLE,
FSE_Compressed, and "Repeat," while the description of each mode reverses the
final two. Consider changing these to be in the same order.


---------------------------------------------------------------------------

§2.1.1.3.2.1:

The description makes it clear that Repeat_Mode is valid following RLE_Mode.
It's unclear whether it's allowed after Predefined_Mode (which would be
well-defined, but somewhat silly to code -- then again, the format seems rather
permissive, so I would *guess* it's allowed). For avoidance of doubt, please
either explicitly allow or explicitly forbid this.


---------------------------------------------------------------------------

§2.1.1.3.2.1:

>  The description of the codes for how
>  to determine these values was presented earlier.

Perhaps a reference to where this was done is in order?

---------------------------------------------------------------------------

§2.3:

While it doesn't impact the compression, this scheme seems pretty iffy in terms
of utility and future-proofness. I would expect to see some kind of minimal
tagging system indicating what *kind* of metadata the frame contains, even if no
such kinds are defined by this document (e.g., something simple like "The first
byte of User-Data indicates the type of metadata contained by this frame", and
then set up an empty IANA table for registering such bytes...)

---------------------------------------------------------------------------

§2.4.1.1:

>  A bitstream is read forward, in little-endian fashion.  It is not
>  necessary to know its exact size, since the size will be discovered
>  and reported by the decoding process.  The bitstream starts by
>  reporting on which scale it operates.  Note that Accuracy_Log =
>  low4bits + 5.

I can't find where "low4bits" is defined. Is this meaning to say that the least
significant 4 bits of the initial (that is, highest-in-memory) byte are used to
encode the Accuracy_Log, with an offset of 5?


---------------------------------------------------------------------------

§2.4.1.1:

>  Value decoded:  Small values use one less bit.

Nit: "...one fewer bit..."

---------------------------------------------------------------------------


§2.4.1.1:

>  All remaining symbols are sorted in their natural order.  Starting
>  from symbol 0 and table position 0, each symbol gets attributed as
>  many cells as its probability.  Cell allocation is non-linear linear;

If the use of the phrase "non-linear linear" is not an editorial error, please
provide a definition of what is meant by this phrase.

---------------------------------------------------------------------------

§2.4.2.1.1.:

>    The full representation occupies (Number_of_Symbols+1)/2 bytes,
>    meaning it uses a last full byte even if Number_of_Symbols is odd.

Based on the phrase after the comma, I think you mean
ceiling((Number_of_Symbols+1)/2). The formula you have implies the opposite.

---------------------------------------------------------------------------

§2.4.2.1.2:

>  The number of symbols to decode is determined by tracking the
>  bitStream overflow condition: If updating state after decoding a
>  symbol would require more bits than remain in the stream, it is
>  assumed that extra bits are zero.  Then, the symbols for each of the
>  I final states are decoded and the process is complete.

I presume the "I" on the beginning of this final line is a typo?

---------------------------------------------------------------------------

§2.4.2.1.3:

>  Symbols are sorted by Weight.  Within same Weight, symbols keep
>  natural order.  Symbols with a Weight of zero are removed.  Then,
>  starting from lowest weight, prefix codes are distributed in order.

In what order?

---------------------------------------------------------------------------

§3.1:

>  Published specification:  [ZSTD]

Given that the type being registered is neither vendor tree nor personal tree,
I'm pretty sure this needs to be an RFC (or submitted by a recognized SDO).
Luckily, we're standing in an RFC-to-be right now, so I think you can fix this
by simply pointing to [RFCXXXX].

>  For further information:  See [ZSTD]

I think this should be [RFCXXXX] as well.

---------------------------------------------------------------------------

§4:

>  A decoder has to demonstrate capabilities to detect and prevent any
>  kind of data tampering in the compressed frame from triggering system
>  faults, such as reading or writing beyond allowed memory ranges.
>  This can be guaranteed either by the implementation language, or by
>  careful bound checkings.  It is highly recommended to fuzz-test
>  decoder implementations to test and harden their capability to detect
>  bad frames and deal with them without any system side-effect.

I think it makes sense to specifically call out encoding of
Number_of_Sequences values that cause the decoder to read into the block
header (and beyond), as well as the indication of a Frame Content Size that is
smaller than the actual uncompressed data, in an attempt to trigger buffer
overflow.

---------------------------------------------------------------------------

§6.2:

>  [XXHASH]      "XXHASH Algorithm", 2017, .

This needs to be normative: one cannot implement the full range of features of
the zstd format without understanding it.
2018-05-22
01 Adam Roach [Ballot Position Update] New position, Discuss, has been recorded for Adam Roach
2018-05-21
01 Deborah Brungard [Ballot comment]
Confused also on the track. Datatracker says intended status is Informational, but the document says Standards Track.
2018-05-21
01 Deborah Brungard [Ballot Position Update] New position, No Objection, has been recorded for Deborah Brungard
2018-05-21
01 Alexey Melnikov
1. Summary

  Alexey Melnikov is the responsible Area Director.

  Zstandard, or "zstd" (pronounced "zee standard"), is a data
  compression mechanism.  This document …
1. Summary

  Alexey Melnikov is the responsible Area Director.

  Zstandard, or "zstd" (pronounced "zee standard"), is a data
  compression mechanism.  This document describes the mechanism, and
  registers a media type to be used when transporting zstd-compressed
  via Multipurpose Internet Mail Extensions (MIME), as well as
  a new HTTP Content Coding.

2. Review and Consensus

  This is not a product of an IETF WG.

  There are multiple implementations of "zstd" compression algorithm, see
  See also Section 5 of the draft.

3. Intellectual Property

  Editors confirmed that they have no IPR to disclose.

4. Other Points

  The document is incorrectly stating that it is Standards Track, however it was IETF Last Called as Informational.
  The document is Informational, so there are no DownRefs.

  IANA Considerations are clear.
2018-05-18
01 Spencer Dawkins
[Ballot comment]
I'll let you folks chat with Mirja about the larger topic of requirements language, but I note that this formulation

2.1.1.1.1.3.  Unused Bit …
[Ballot comment]
I'll let you folks chat with Mirja about the larger topic of requirements language, but I note that this formulation

2.1.1.1.1.3.  Unused Bit

  The value of this bit should be set to zero.  A decoder compliant
  with this specification version shall not interpret it.  It might be
  used in a future version, to signal a property which is not mandatory
  to properly decode the frame.

really doesn't protect that bit for future use. I don't care if it's "MUST be set to zero by encoders and ignored by decoders" or "is always set to zero in this version of the algorithm", but a weaker constraint doesn't prevent implementers from squatting on this bit now.

You know, something like the next subsection:

2.1.1.1.1.4.  Reserved Bit

  This bit is reserved for some future feature.  Its value must be
  zero.  A decoder compliant with this specification version must
  ensure it is not set.  This bit may be used in a future revision, to
  signal a feature that must be interpreted to decode the frame
  correctly.

This text

  For improved interoperability, decoders are recommended to be
  compatible with Window_Size >= 8 MB, and encoders are recommended to
  not request more than 8 MB.  It's merely a recommendation though, and
  decoders are free to support larger or lower limits, depending on
  local limitations.

is pretty clear about the motivation for limiting the Window_Size to 8 MB, and why an implementation might want to use a smaller Window_Size, but is there anything you could say about why an implementation might want to use a larger Window_Size value?

In this text,

2.4.  Entropy Encoding

  Two types of entropy encoding are used by the Zstandard format: FSE,
  and Huffman coding.

could you give any guidance about why you might choose to use one format over another?

Is the meaning of "under control of a third party" well understood?

  One should never compress together a message whose content must
  remain secret with a message under control of a third party.

I might be able to guess at a precise definition, but I'd be guessing.

I'm wondering if you really want to remove all of

5.  Implementation Status

  [RFC EDITOR: Please remove this section prior to publication.]

  Source code for a C language implementation of a "Zstandard"
  compliant library is available at [ZSTD-GITHUB].  This implementation
  is production ready, implementing the full range of the
  specification.  It is tested against security hazards, and widely
  deployed within Facebook infrastructure.

given that this text

2.5.  Dictionary Format

(snip)

  However, dictionaries created by "zstd --train" in the reference
  implementation follow a specific format, described here.

refers to what I'm assuming is the same reference implementation (but I can't be sure, because there's no reference pointer in the Section 2.5 usage).

(I was on the IESG and balloted Yes for https://datatracker.ietf.org/doc/rfc7942/, so I understand that this says you delete Implementation Sections before publishing as an RFC, but I don't think pointers to a reference implementation fall into the same category as the typical "so far, X, Y, and Z have implemented this protocol" Implementation Sections that are instantly outdated. A pointer to a reference implementation sounds more useful for future readers. But, at a minimum, adding a reference pointer to the Section 2.5 occurrence would be useful, since that's the first time a reference implementation is mentioned)
2018-05-18
01 Spencer Dawkins [Ballot Position Update] New position, No Objection, has been recorded for Spencer Dawkins
2018-05-18
01 Mirja Kühlewind
[Ballot comment]
Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments:

1) What's the reason that this document is submitted as "Standards …
[Ballot comment]
Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments:

1) What's the reason that this document is submitted as "Standards Track"? In there a working group that is planning to use this mechanism? Or why does it need to be in the IETF/have IETF consensus?

2) I think this document would benefit from the use of normative language.

3) Why is there a normative reference to a website? I would think the document should be and is describing the compression mechanism comprehensively without the need to have a look at a website that might not even provide a stable reference.

Sorry one more comment I forgot earlier:
4) Should the User_Data Frame be further discussed in the security section as it can carry arbitrary information which can be security-relevant or privacy sensitive...?
2018-05-18
01 Mirja Kühlewind Ballot comment text updated for Mirja Kühlewind
2018-05-18
01 Mirja Kühlewind
[Ballot comment]
Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments:

1) What's the reason that this document is submitted as "Standards …
[Ballot comment]
Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments:

1) What's the reason that this document is submitted as "Standards Track"? In there a working group that is planning to use this mechanism? Or why does it need to be in the IETF/have IETF consensus?

2) I think this document would benefit from the use of normative language.

3) Why is there a normative reference to a website? I would think the document should be and is describing the compression mechanism comprehensively without the need to have a look at a website that might not even provide a stable reference.
2018-05-18
01 Mirja Kühlewind [Ballot Position Update] New position, No Objection, has been recorded for Mirja Kühlewind
2018-05-18
01 Alexey Melnikov IESG state changed to IESG Evaluation from Waiting for Writeup
2018-05-18
01 Alexey Melnikov Ballot has been issued
2018-05-18
01 Alexey Melnikov [Ballot Position Update] New position, Yes, has been recorded for Alexey Melnikov
2018-05-18
01 Alexey Melnikov Created "Approve" ballot
2018-05-18
01 Alexey Melnikov Ballot writeup was changed
2018-04-26
01 Susan Hares Request for Last Call review by OPSDIR Completed: Has Issues. Reviewer: Susan Hares. Sent review to list.
2018-04-23
01 Alexey Melnikov Placed on agenda for telechat - 2018-05-24
2018-04-23
01 (System) IESG state changed to Waiting for Writeup from In Last Call
2018-04-20
01 (System) IANA Review state changed to IANA OK - Actions Needed from IANA - Review Needed
2018-04-20
01 Sabrina Tanamal
(Via drafts-lastcall@iana.org): IESG/Authors/WG Chairs:

The IANA Services Operator has completed its review of draft-kucherawy-dispatch-zstd-01. If any part of this review is inaccurate, please let …
(Via drafts-lastcall@iana.org): IESG/Authors/WG Chairs:

The IANA Services Operator has completed its review of draft-kucherawy-dispatch-zstd-01. If any part of this review is inaccurate, please let us know.

The IANA Services Operator understands that, upon approval of this document, there are two actions which we must complete.

First, in the application registry on the Media Types registry page located at:

https://www.iana.org/assignments/media-types/

a single, new media type will be registered as follows:

Name: zstd
Template: [ TBD-at-Registration ]
Reference: [ RFC-to-be ]

Second, in the HTTP Content Coding Registry on the Hypertext Transfer Protocol (HTTP) Parameters registry page located at:

https://www.iana.org/assignments/http-parameters/

a single, new registration will be added as follows:

Name: zstd
Description: A stream of bytes compressed using the Zstandard protocol
Reference: [ RFC-to-be ]

The IANA Services Operator understands that these are the only actions required to be completed upon approval of this document.

Note:  The actions requested in this document will not be completed until the document has been approved for publication as an RFC. This message is meant only to confirm the list of actions that will be performed.


Thank you,

Sabrina Tanamal
Senior IANA Services Specialist
2018-04-19
01 Tero Kivinen Request for Last Call review by SECDIR Completed: Ready. Reviewer: Scott Kelly.
2018-04-19
01 Vijay Gurbani Request for Last Call review by GENART Completed: Ready with Nits. Reviewer: Vijay Gurbani. Sent review to list.
2018-04-05
01 Jean Mahoney Request for Last Call review by GENART is assigned to Vijay Gurbani
2018-04-05
01 Jean Mahoney Request for Last Call review by GENART is assigned to Vijay Gurbani
2018-04-04
01 Gunter Van de Velde Request for Last Call review by OPSDIR is assigned to Susan Hares
2018-04-04
01 Gunter Van de Velde Request for Last Call review by OPSDIR is assigned to Susan Hares
2018-03-29
01 Tero Kivinen Request for Last Call review by SECDIR is assigned to Scott Kelly
2018-03-29
01 Tero Kivinen Request for Last Call review by SECDIR is assigned to Scott Kelly
2018-03-26
01 Amy Vezza IANA Review state changed to IANA - Review Needed
2018-03-26
01 Amy Vezza
The following Last Call announcement was sent out (ends 2018-04-23):

From: The IESG
To: IETF-Announce
CC: draft-kucherawy-dispatch-zstd@ietf.org, alexey.melnikov@isode.com
Reply-To: ietf@ietf.org
Sender:
Subject: Last Call:  …
The following Last Call announcement was sent out (ends 2018-04-23):

From: The IESG
To: IETF-Announce
CC: draft-kucherawy-dispatch-zstd@ietf.org, alexey.melnikov@isode.com
Reply-To: ietf@ietf.org
Sender:
Subject: Last Call:  (Zstandard Compression and The application/zstd Media Type) to Informational RFC


The IESG has received a request from an individual submitter to consider the
following document: - 'Zstandard Compression and The application/zstd Media
Type'
  as Informational RFC

The IESG plans to make a decision in the next few weeks, and solicits final
comments on this action. Please send substantive comments to the
ietf@ietf.org mailing lists by 2018-04-23. Exceptionally, comments may be
sent to iesg@ietf.org instead. In either case, please retain the beginning of
the Subject line to allow automated sorting.

Abstract


  Zstandard, or "zstd" (pronounced "zee standard"), is a data
  compression mechanism.  This document describes the mechanism, and
  registers a media type to be used when transporting zstd-compressed
  via Multipurpose Internet Mail Extensions (MIME).




The file can be obtained via
https://datatracker.ietf.org/doc/draft-kucherawy-dispatch-zstd/

IESG discussion can be tracked via
https://datatracker.ietf.org/doc/draft-kucherawy-dispatch-zstd/ballot/


No IPR declarations have been submitted directly on this I-D.




2018-03-26
01 Amy Vezza IESG state changed to In Last Call from Last Call Requested
2018-03-26
01 Amy Vezza Last call announcement was changed
2018-03-23
01 Alexey Melnikov Last call was requested
2018-03-23
01 Alexey Melnikov Last call announcement was generated
2018-03-23
01 Alexey Melnikov Ballot approval text was generated
2018-03-23
01 Alexey Melnikov Ballot writeup was generated
2018-03-23
01 Alexey Melnikov IESG state changed to Last Call Requested from AD Evaluation
2018-03-23
01 Alexey Melnikov Changed consensus to Yes from Unknown
2018-03-23
01 Alexey Melnikov IESG state changed to AD Evaluation from Publication Requested
2018-03-18
01 Alexey Melnikov IETF WG state changed to Submitted to IESG for Publication
2018-03-18
01 Alexey Melnikov IESG state changed to Publication Requested from AD is watching
2017-11-12
01 Murray Kucherawy New version available: draft-kucherawy-dispatch-zstd-01.txt
2017-11-12
01 (System) New version approved
2017-11-12
01 (System) Request for posting confirmation emailed to previous authors: Yann Collet , Murray Kucherawy
2017-11-12
01 Murray Kucherawy Uploaded new revision
2017-11-12
00 Alexey Melnikov Assigned to Applications and Real-Time Area
2017-11-12
00 Alexey Melnikov Responsible AD changed to Alexey Melnikov
2017-11-12
00 Alexey Melnikov Intended Status changed to Informational
2017-11-12
00 Alexey Melnikov IESG process started in state AD is watching
2017-11-12
00 Alexey Melnikov Stream changed to IETF from None
2017-11-12
00 Murray Kucherawy Added to session: IETF-100: dispatch  Mon-0930
2017-09-25
00 Murray Kucherawy New version available: draft-kucherawy-dispatch-zstd-00.txt
2017-09-25
00 (System) New version approved
2017-09-25
00 Murray Kucherawy Request for posting confirmation emailed  to submitter and authors: Yann Collet , Murray Kucherawy , "Murray S. Kucherawy"
2017-09-25
00 Murray Kucherawy Uploaded new revision