Zstandard Compression and the application/zstd Media Type
draft-kucherawy-dispatch-zstd-03
Revision differences
Document history
Date | Rev. | By | Action |
---|---|---|---|
2018-10-02
|
03 | (System) | RFC Editor state changed to AUTH48-DONE from AUTH48 |
2018-09-19
|
03 | (System) | RFC Editor state changed to AUTH48 from RFC-EDITOR |
2018-09-07
|
03 | (System) | RFC Editor state changed to RFC-EDITOR from EDIT |
2018-07-17
|
03 | (System) | IANA Action state changed to RFC-Ed-Ack from Waiting on RFC Editor |
2018-07-17
|
03 | (System) | IANA Action state changed to Waiting on RFC Editor from In Progress |
2018-07-17
|
03 | (System) | IANA Action state changed to In Progress from Waiting on Authors |
2018-07-16
|
03 | (System) | RFC Editor state changed to EDIT |
2018-07-16
|
03 | (System) | IESG state changed to RFC Ed Queue from Approved-announcement sent |
2018-07-16
|
03 | (System) | Announcement was received by RFC Editor |
2018-07-16
|
03 | (System) | IANA Action state changed to Waiting on Authors from In Progress |
2018-07-16
|
03 | (System) | IANA Action state changed to In Progress |
2018-07-16
|
03 | Cindy Morgan | IESG state changed to Approved-announcement sent from Approved-announcement to be sent |
2018-07-16
|
03 | Cindy Morgan | IESG has approved the document |
2018-07-16
|
03 | Cindy Morgan | Closed "Approve" ballot |
2018-07-16
|
03 | Cindy Morgan | Ballot approval text was generated |
2018-07-16
|
03 | Alexey Melnikov | IESG state changed to Approved-announcement to be sent from IESG Evaluation::AD Followup |
2018-07-15
|
03 | Murray Kucherawy | New version available: draft-kucherawy-dispatch-zstd-03.txt |
2018-07-15
|
03 | (System) | New version approved |
2018-07-15
|
03 | (System) | Request for posting confirmation emailed to previous authors: Yann Collet , Murray Kucherawy |
2018-07-15
|
03 | Murray Kucherawy | Uploaded new revision |
2018-07-13
|
02 | Alexey Melnikov | [Ballot comment] Checking a few minor points with authors before approval. |
2018-07-13
|
02 | Alexey Melnikov | Ballot comment text updated for Alexey Melnikov |
2018-07-12
|
02 | Benjamin Kaduk | [Ballot comment] Thanks for addressing my DISCUSS (and pointing out that my other point was not grounded in fact)! I think I had intended to … [Ballot comment] Thanks for addressing my DISCUSS (and pointing out that my other point was not grounded in fact)! I think I had intended to switch to Yes, but since I was so slow to respond to the author comments, I have forgotten enough about the document that I will just go with No Objection instead, to avoid any further delay. Original comments preserved below: Some high-level comments: After reading Section 2.4.2.1 (and subsections), I'm not sure I fully understand the procedures for the Huffman coding. Granted, in order to obtain the efficient compression ratios the procedure is necessarily complicated, so arguably there is some onus on me as the reader to work harder to understand. Having said that, I think the mapping from weights to prefixes in the prefix code makes sense to me, but I'm not sure I understand how the weights are assigned to their respective symbols/literals (i.e., what the prefix decodes to). My current reading of the text is that weights are given for literals 0 through (last literal - 1), and in particular if I want to do raw (non-FSE) weights, I can only do 128 literals like this. So, are the symbols/literals just these values from 0 to 127? That seems unlikely to be correct, given that we want to compress data containing bytes with the high bit set, but I'm unsure where I'm going astray. It seems like a link to some dedicated/official test vectors seems like it would be useful, if test vectors themselves are seen as being too bulky to include in the document. The text about third-party dictionaries in Section 4 makes me wonder if it should explicitly be stated that "dictionary input to decompression should be treated as being similarly untrusted as input compressed frames, and the appropriate bounds checking performed on data accesses as needed". In Appendix B, I don't see a reference to match up to a chapter "from normalized distribution to decoding tables". Some more minor section-by-section comments follow. Section 2.1.1 Should the frame-structure diagram list "0 or 4 bytes" for the content checksum? Content_Checksum: An optional 32-bit checksum, only present if the Content_Checksum_flag is set. The content checksum is the result of the xxh64() hash function [XXHASH] digesting the origina "original" Section 2.1.1.1.1.2 In this case, Window_Descriptor byte is skipped, but "the Window_Descriptor byte" What's the difference between "Unused" and "Reserved", viz. 2.1.1.1.1.3 and 2.1.1.1.1.4? Section 2.1.1.1.1.6. This is a two-bit flag (= FHD & 3) two-bit flag (whose value is obtained by masking the frame header descriptor with 0x3) Section 2.1.1.1.2. Provides guarantees on minimum memory buffer required to decompress a frame. [...] "the minimum memory buffer" Section 2.1.1.1.3. Field size depends on Dictionary_ID_flag. [...] "The field size depends on the" How are dictionary (ID)s allocated and standardized? Section 2.1.1.2.2. RLE_Block: This is a single byte, repeated Block_Size times. Block_Content consists of a single byte. On the decompression side, this byte must be repeated Block_Size times. I think we need normative language on "the decoder MUST verify [...]". Section 2.1.1.3 This "all previously decoded data" is only within a frame, right? It might be worth reiterating that here. Section 2.1.1.3.1.1. For Size_Format for Raw_Literals_Block and RLE_Literals_Block, I think we need to explicitly say "when parsing the flags bit-by-bit, if the low-order bit of the Size_Format field is zero, the field is only one bit, and processing proceeds to the next bitfield. Regenerated_Size uses [...]" Section 2.1.1.3.2.1. Please expand FSE on first use. Section 2.4.1.1 Finally, the decoder can tell how many bytes were used in this process, and how many symbols are present. The bitstream consumes a round number of bytes. Any remaining bit within the last byte is simply unused. Usually we say "set to zero on encode, ignore on decode" to make things deterministic. Section 2.4.2 When decompressing, the last byte containing the padding is the first byte to read. The decompressor needs to skip 0-7 initial 0-bits and the first 1-bit lt occurs. Afterwards, the useful part of the bitstream begins. I guess "lt" is supposed to be "that"? Section 2.4.2.1 The tree depth is 4, since its smallest element uses 4 bits. It's not entirely clear to me what "smallest" means here -- presumably "lowest in the tree", though that is rather tautological in this context. "Smallest weight" would make sense, except there is not a weight column in the table. Section 2.4.2.1.3 if Number_of_Bits != 0 Number_of_Bits = Max_Number_of_Bits + 1 - Weight Should the first Number_of_Bits be Weight? |
2018-07-12
|
02 | Benjamin Kaduk | [Ballot Position Update] Position for Benjamin Kaduk has been changed to No Objection from Discuss |
2018-07-11
|
02 | Adam Roach | [Ballot comment] Thanks for addressing my discuss and comments. |
2018-07-11
|
02 | Adam Roach | [Ballot Position Update] Position for Adam Roach has been changed to No Objection from Discuss |
2018-07-11
|
02 | (System) | Sub state has been changed to AD Followup from Revised ID Needed |
2018-07-11
|
02 | (System) | IANA Review state changed to Version Changed - Review Needed from IANA OK - Actions Needed |
2018-07-11
|
02 | Cindy Morgan | New version available: draft-kucherawy-dispatch-zstd-02.txt |
2018-07-11
|
02 | (System) | Secretariat manually posting. Approvals already received |
2018-07-11
|
02 | Cindy Morgan | Uploaded new revision |
2018-05-24
|
01 | Cindy Morgan | IESG state changed to IESG Evaluation::Revised I-D Needed from IESG Evaluation |
2018-05-24
|
01 | Ignas Bagdonas | [Ballot comment] No objection as in "have read the document but it is not in the domain of expertise that I can authoritatively comment on". |
2018-05-24
|
01 | Ignas Bagdonas | [Ballot Position Update] New position, No Objection, has been recorded for Ignas Bagdonas |
2018-05-24
|
01 | Benjamin Kaduk | [Ballot discuss] I support Adam's DISCUSS. Additionally, I think that there are significant privacy considerations associated with the Skippable Frames described in Section 2.3, that … [Ballot discuss] I support Adam's DISCUSS. Additionally, I think that there are significant privacy considerations associated with the Skippable Frames described in Section 2.3, that should be documented before this document advances. Specifically, this provides an easy way for a party (not even necessarily the encoder, since these frames can be inserted independently from the actual compression scheme) to insert (e.g.) tracking data into a compressed stream and have it ignored by standard decoders. There are myriad possibilities for how this could be used, such as for watermarking files with information about how they were downloaded/generated/etc., which could be used for tracking leaks from confidential materials or illegal distribution of copyrighted content; there is potential for personally identifying information to be included; the list goes on. I can see that there can also be useful ways to use these frames to introduce additional metadata about the compressed content, but fear that we may want to give guidance for these frames to be stripped/forbidden/etc. absent additional context to indicate that the information in the skippable frame is non-malicious. A more minor note, but still IMO blocking -- in Section 2.1.1.1.2: windowLog = 10 + Exponent; windowBase = 1 << windowLog; windowAdd = (windowBase / 8) * Mantissa; Window_Size = windowBase + windowAdd; I don't think this formula is correct -- windowAdd in this formula is not modified by windowLog at all, which does not match up with the stated maxiumum bound in the body text. |
2018-05-24
|
01 | Benjamin Kaduk | [Ballot comment] Some high-level comments: After reading Section 2.4.2.1 (and subsections), I'm not sure I fully understand the procedures for the Huffman coding. Granted, in … [Ballot comment] Some high-level comments: After reading Section 2.4.2.1 (and subsections), I'm not sure I fully understand the procedures for the Huffman coding. Granted, in order to obtain the efficient compression ratios the procedure is necessarily complicated, so arguably there is some onus on me as the reader to work harder to understand. Having said that, I think the mapping from weights to prefixes in the prefix code makes sense to me, but I'm not sure I understand how the weights are assigned to their respective symbols/literals (i.e., what the prefix decodes to). My current reading of the text is that weights are given for literals 0 through (last literal - 1), and in particular if I want to do raw (non-FSE) weights, I can only do 128 literals like this. So, are the symbols/literals just these values from 0 to 127? That seems unlikely to be correct, given that we want to compress data containing bytes with the high bit set, but I'm unsure where I'm going astray. It seems like a link to some dedicated/official test vectors seems like it would be useful, if test vectors themselves are seen as being too bulky to include in the document. The text about third-party dictionaries in Section 4 makes me wonder if it should explicitly be stated that "dictionary input to decompression should be treated as being similarly untrusted as input compressed frames, and the appropriate bounds checking performed on data accesses as needed". In Appendix B, I don't see a reference to match up to a chapter "from normalized distribution to decoding tables". Some more minor section-by-section comments follow. Section 2.1.1 Should the frame-structure diagram list "0 or 4 bytes" for the content checksum? Content_Checksum: An optional 32-bit checksum, only present if the Content_Checksum_flag is set. The content checksum is the result of the xxh64() hash function [XXHASH] digesting the origina "original" Section 2.1.1.1.1.2 In this case, Window_Descriptor byte is skipped, but "the Window_Descriptor byte" What's the difference between "Unused" and "Reserved", viz. 2.1.1.1.1.3 and 2.1.1.1.1.4? Section 2.1.1.1.1.6. This is a two-bit flag (= FHD & 3) two-bit flag (whose value is obtained by masking the frame header descriptor with 0x3) Section 2.1.1.1.2. Provides guarantees on minimum memory buffer required to decompress a frame. [...] "the minimum memory buffer" Section 2.1.1.1.3. Field size depends on Dictionary_ID_flag. [...] "The field size depends on the" How are dictionary (ID)s allocated and standardized? Section 2.1.1.2.2. RLE_Block: This is a single byte, repeated Block_Size times. Block_Content consists of a single byte. On the decompression side, this byte must be repeated Block_Size times. I think we need normative language on "the decoder MUST verify [...]". Section 2.1.1.3 This "all previously decoded data" is only within a frame, right? It might be worth reiterating that here. Section 2.1.1.3.1.1. For Size_Format for Raw_Literals_Block and RLE_Literals_Block, I think we need to explicitly say "when parsing the flags bit-by-bit, if the low-order bit of the Size_Format field is zero, the field is only one bit, and processing proceeds to the next bitfield. Regenerated_Size uses [...]" Section 2.1.1.3.2.1. Please expand FSE on first use. Section 2.4.1.1 Finally, the decoder can tell how many bytes were used in this process, and how many symbols are present. The bitstream consumes a round number of bytes. Any remaining bit within the last byte is simply unused. Usually we say "set to zero on encode, ignore on decode" to make things deterministic. Section 2.4.2 When decompressing, the last byte containing the padding is the first byte to read. The decompressor needs to skip 0-7 initial 0-bits and the first 1-bit lt occurs. Afterwards, the useful part of the bitstream begins. I guess "lt" is supposed to be "that"? Section 2.4.2.1 The tree depth is 4, since its smallest element uses 4 bits. It's not entirely clear to me what "smallest" means here -- presumably "lowest in the tree", though that is rather tautological in this context. "Smallest weight" would make sense, except there is not a weight column in the table. Section 2.4.2.1.3 if Number_of_Bits != 0 Number_of_Bits = Max_Number_of_Bits + 1 - Weight Should the first Number_of_Bits be Weight? |
2018-05-24
|
01 | Benjamin Kaduk | [Ballot Position Update] New position, Discuss, has been recorded for Benjamin Kaduk |
2018-05-24
|
01 | Alvaro Retana | [Ballot comment] I share Adam's concern and support his DISCUSS. FWIW, I would also be happy with a clarification along the lines of what Adam … [Ballot comment] I share Adam's concern and support his DISCUSS. FWIW, I would also be happy with a clarification along the lines of what Adam suggested. |
2018-05-24
|
01 | Alvaro Retana | Ballot comment text updated for Alvaro Retana |
2018-05-24
|
01 | Martin Vigoureux | [Ballot Position Update] New position, No Objection, has been recorded for Martin Vigoureux |
2018-05-23
|
01 | Terry Manderson | [Ballot Position Update] New position, No Objection, has been recorded for Terry Manderson |
2018-05-23
|
01 | Ben Campbell | [Ballot Position Update] Position for Ben Campbell has been changed to No Objection from No Record |
2018-05-23
|
01 | Ben Campbell | [Ballot comment] I agree with Adam's discuss points, although it was not clear to me if public dictionaries are expected. Others have already pointed out … [Ballot comment] I agree with Adam's discuss points, although it was not clear to me if public dictionaries are expected. Others have already pointed out the document says "standards track". Otherwise I have a few mostly editorial comments: §2.1 - s/indepedently/independently §2.1.1, last paragraph - s/origina/original §2.1.1.1.1.3 "The value of this bit should be set to zero"- I know this isn't using 2119 language--but is the plain-English meaning of "should" what you had in mind? Also, why is this stated differently from §2.1.1.1.1.4? (Also also, what's the IETF record for section nesting depth?) §2.1.1.2.2, reserved - The text says "this value cannot be used with the current specification". What if you get a block that used it? §2.1.1.2.3 : "always strictly less than" - Is this really true? There's no possible input (e.g. already compressed text) where the decompressed and compressed sizes are equal? §4: - "Usual precautions"- does that refer to the subsequent paragraphs, or something else? - Any chance of a citation for "fuzz-test"? - Last paragraph, last sentence: Forward looking predictions in RFCs have a habit of becoming dated, one way or another. §6.2 - Are none of these references properly normative? |
2018-05-23
|
01 | Ben Campbell | Ballot comment text updated for Ben Campbell |
2018-05-23
|
01 | Alvaro Retana | [Ballot comment] I share Adam's concern and support his DISCUSS. FWIW, I would also be happy with a clarification along the lines of what Adan … [Ballot comment] I share Adam's concern and support his DISCUSS. FWIW, I would also be happy with a clarification along the lines of what Adan suggested. |
2018-05-23
|
01 | Alvaro Retana | [Ballot Position Update] New position, No Objection, has been recorded for Alvaro Retana |
2018-05-23
|
01 | Alissa Cooper | [Ballot Position Update] New position, No Objection, has been recorded for Alissa Cooper |
2018-05-22
|
01 | Suresh Krishnan | [Ballot Position Update] New position, No Objection, has been recorded for Suresh Krishnan |
2018-05-22
|
01 | Adam Roach | [Ballot discuss] Thanks for taking the time to document this format for public consumption. I have a handful of blocking concerns (although I'm open to … [Ballot discuss] Thanks for taking the time to document this format for public consumption. I have a handful of blocking concerns (although I'm open to listening to reasons that I might be wrong on this front), and a number of additional comments. --------------------------------------------------------------------------- I have a lot of heartburn around the publication of an in informational document of a protocol called "Zstandard." I know the protocol has been in development for a while, and has non-trivial deployment, so I understand that there would be reluctance to change its name at this point. If we leave the name as-is, I do not think that the normal informational boilerplate is sufficient. I would like to see additional text that explicitly addresses the situation, along the lines of: [Abstract] Zstandard, or "zstd" (pronounced "zee standard"), is a data compression mechanism. This document describes the mechanism, and registers a media type to be used when transporting zstd-compressed via Multipurpose Internet Mail Extensions (MIME). Despite the use of the word "standard" as part of its name, readers are advised that this document is not an Internet Standards Track specification, and is being published for informational purposes only. [Introduction] Zstandard, or "zstd" (pronounced "zee standard") is a data compression mechanism, akin to gzip [RFC1952]. Despite the use of the word "standard" as part of its name, readers are advised that this document is not an Internet Standards Track specification, and is being published for informational purposes only. --------------------------------------------------------------------------- §2.2.1: > For the first block, the starting offset history is populated with > the following values : 1, 4 and 8 (in order). I fear this is ambiguously specified. I can interpret this as either temporal order: Repeated_Offset1 = 8 Repeated_Offset2 = 4 Repeated_Offset3 = 1 Or as sequential order: Repeated_Offset1 = 1 Repeated_Offset2 = 4 Repeated_Offset3 = 8 Please clarify, as this confusion can lead to incompatible implementations. --------------------------------------------------------------------------- The dictionary scheme in here seems problematic, in that the intention is clearly to have public, well-known dictionaries; and the dictionaries are intended to have globally-unique identifiers for that purpose. 31 bits isn't enough space to achieve uniqueness through randomness. While there are other approaches that involve things like dictionary IDs that are hashes of their contents (see, e.g., SigComp), I suspect the notion of expanding the size of this field isn't very appealing. If you keep the format the same (4 bytes), I don't see how the dictionary part of this scheme can be interoperable without a registry of some kind. Even if the intention is to publish further documents on the topic of dictionaries, I believe publication of this document needs to wait on establishment of such a registry. I have no opinion about whether this is resolved by creating the registry in this document, or in holding its publication until the document that does create such a registry is published. |
2018-05-22
|
01 | Adam Roach | [Ballot comment] General: The document uses the phrase "natural order" in several places without defining it. I can make a guess about what is intended, … [Ballot comment] General: The document uses the phrase "natural order" in several places without defining it. I can make a guess about what is intended, but I'm not completely confident. Adding a definition for this term would be very helpful. --------------------------------------------------------------------------- §2.1.1: > of the xxh64() hash function [XXHASH] digesting the origina Typo: "original" --------------------------------------------------------------------------- §2.1.1.1.1.1. > This is a two-bit flag (equivalent to Frame_Header_Descriptor left- > shifted six bits) Shouldn't this say "right-shifted"? --------------------------------------------------------------------------- §2.1.1.3: > To decode a compressed block, the following elements are necessary: > > o Previous decoded data, up to a distance of Window_Size, or all > previously decoded data when Single_Segment_flag is set. To be clear, this is "up to a distance of Window_Size or to the beginning of the Frame, whichever is smaller," right? I believe the intention is that you can't use data from the previous frame to encode this one, and the text should probably take care to avoid any implication to the contrary. --------------------------------------------------------------------------- §2.1.1.3.1: > Literals can be stored uncompressed or compressed using Huffman > prefix codes. When compressed, an optional tree description can be > present, followed by one or four streams. A brief description right here of the concept of a "stream" would be quite helpful in understanding the following several sections. --------------------------------------------------------------------------- §2.1.1.3.1.1: > Value ?0: Size_Format uses one bit. Regenerated_Size uses five bits Please either define the meaning of "?" here, or explicitly call out "Values 00 and 10:" --------------------------------------------------------------------------- §2.1.1.3.2.1: > o if (byte0 < 255): Number_of_Sequences = ((byte0-128) << 8) + > byte1. Uses 2 bytes. Please change to "if (127 < byte0 < 255):" --------------------------------------------------------------------------- §2.1.1.3.2.1: > Predefined_Mode: A predefined FSE distribution table is used, > defined in Section 2.1.1.3.2.2. No distribution table will be > present. I see that "FSE" is expanded in section 2.4.1. Please expand it here, or provide a reference to 2.4.1 here. --------------------------------------------------------------------------- §2.1.1.3.2.1: The table of compression modes lists the modes in the order "Predefined, RLE, FSE_Compressed, and "Repeat," while the description of each mode reverses the final two. Consider changing these to be in the same order. --------------------------------------------------------------------------- §2.1.1.3.2.1: The description makes it clear that Repeat_Mode is valid following RLE_Mode. It's unclear whether it's allowed after Predefined_Mode (which would be well-defined, but somewhat silly to code -- then again, the format seems rather permissive, so I would *guess* it's allowed). For avoidance of doubt, please either explicitly allow or explicitly forbid this. --------------------------------------------------------------------------- §2.1.1.3.2.1: > The description of the codes for how > to determine these values was presented earlier. Perhaps a reference to where this was done is in order? --------------------------------------------------------------------------- §2.3: While it doesn't impact the compression, this scheme seems pretty iffy in terms of utility and future-proofness. I would expect to see some kind of minimal tagging system indicating what *kind* of metadata the frame contains, even if no such kinds are defined by this document (e.g., something simple like "The first byte of User-Data indicates the type of metadata contained by this frame", and then set up an empty IANA table for registering such bytes...) --------------------------------------------------------------------------- §2.4.1.1: > A bitstream is read forward, in little-endian fashion. It is not > necessary to know its exact size, since the size will be discovered > and reported by the decoding process. The bitstream starts by > reporting on which scale it operates. Note that Accuracy_Log = > low4bits + 5. I can't find where "low4bits" is defined. Is this meaning to say that the least significant 4 bits of the initial (that is, highest-in-memory) byte are used to encode the Accuracy_Log, with an offset of 5? --------------------------------------------------------------------------- §2.4.1.1: > Value decoded: Small values use one less bit. Nit: "...one fewer bit..." --------------------------------------------------------------------------- §2.4.1.1: > All remaining symbols are sorted in their natural order. Starting > from symbol 0 and table position 0, each symbol gets attributed as > many cells as its probability. Cell allocation is non-linear linear; If the use of the phrase "non-linear linear" is not an editorial error, please provide a definition of what is meant by this phrase. --------------------------------------------------------------------------- §2.4.2.1.1.: > The full representation occupies (Number_of_Symbols+1)/2 bytes, > meaning it uses a last full byte even if Number_of_Symbols is odd. Based on the phrase after the comma, I think you mean ceiling((Number_of_Symbols+1)/2). The formula you have implies the opposite. --------------------------------------------------------------------------- §2.4.2.1.2: > The number of symbols to decode is determined by tracking the > bitStream overflow condition: If updating state after decoding a > symbol would require more bits than remain in the stream, it is > assumed that extra bits are zero. Then, the symbols for each of the > I final states are decoded and the process is complete. I presume the "I" on the beginning of this final line is a typo? --------------------------------------------------------------------------- §2.4.2.1.3: > Symbols are sorted by Weight. Within same Weight, symbols keep > natural order. Symbols with a Weight of zero are removed. Then, > starting from lowest weight, prefix codes are distributed in order. In what order? --------------------------------------------------------------------------- §3.1: > Published specification: [ZSTD] Given that the type being registered is neither vendor tree nor personal tree, I'm pretty sure this needs to be an RFC (or submitted by a recognized SDO). Luckily, we're standing in an RFC-to-be right now, so I think you can fix this by simply pointing to [RFCXXXX]. > For further information: See [ZSTD] I think this should be [RFCXXXX] as well. --------------------------------------------------------------------------- §4: > A decoder has to demonstrate capabilities to detect and prevent any > kind of data tampering in the compressed frame from triggering system > faults, such as reading or writing beyond allowed memory ranges. > This can be guaranteed either by the implementation language, or by > careful bound checkings. It is highly recommended to fuzz-test > decoder implementations to test and harden their capability to detect > bad frames and deal with them without any system side-effect. I think it makes sense to specifically call out encoding of Number_of_Sequences values that cause the decoder to read into the block header (and beyond), as well as the indication of a Frame Content Size that is smaller than the actual uncompressed data, in an attempt to trigger buffer overflow. --------------------------------------------------------------------------- §6.2: > [XXHASH] "XXHASH Algorithm", 2017, . This needs to be normative: one cannot implement the full range of features of the zstd format without understanding it. |
2018-05-22
|
01 | Adam Roach | [Ballot Position Update] New position, Discuss, has been recorded for Adam Roach |
2018-05-21
|
01 | Deborah Brungard | [Ballot comment] Confused also on the track. Datatracker says intended status is Informational, but the document says Standards Track. |
2018-05-21
|
01 | Deborah Brungard | [Ballot Position Update] New position, No Objection, has been recorded for Deborah Brungard |
2018-05-21
|
01 | Alexey Melnikov | 1. Summary Alexey Melnikov is the responsible Area Director. Zstandard, or "zstd" (pronounced "zee standard"), is a data compression mechanism. This document … 1. Summary Alexey Melnikov is the responsible Area Director. Zstandard, or "zstd" (pronounced "zee standard"), is a data compression mechanism. This document describes the mechanism, and registers a media type to be used when transporting zstd-compressed via Multipurpose Internet Mail Extensions (MIME), as well as a new HTTP Content Coding. 2. Review and Consensus This is not a product of an IETF WG. There are multiple implementations of "zstd" compression algorithm, see See also Section 5 of the draft. 3. Intellectual Property Editors confirmed that they have no IPR to disclose. 4. Other Points The document is incorrectly stating that it is Standards Track, however it was IETF Last Called as Informational. The document is Informational, so there are no DownRefs. IANA Considerations are clear. |
2018-05-18
|
01 | Spencer Dawkins | [Ballot comment] I'll let you folks chat with Mirja about the larger topic of requirements language, but I note that this formulation 2.1.1.1.1.3. Unused Bit … [Ballot comment] I'll let you folks chat with Mirja about the larger topic of requirements language, but I note that this formulation 2.1.1.1.1.3. Unused Bit The value of this bit should be set to zero. A decoder compliant with this specification version shall not interpret it. It might be used in a future version, to signal a property which is not mandatory to properly decode the frame. really doesn't protect that bit for future use. I don't care if it's "MUST be set to zero by encoders and ignored by decoders" or "is always set to zero in this version of the algorithm", but a weaker constraint doesn't prevent implementers from squatting on this bit now. You know, something like the next subsection: 2.1.1.1.1.4. Reserved Bit This bit is reserved for some future feature. Its value must be zero. A decoder compliant with this specification version must ensure it is not set. This bit may be used in a future revision, to signal a feature that must be interpreted to decode the frame correctly. This text For improved interoperability, decoders are recommended to be compatible with Window_Size >= 8 MB, and encoders are recommended to not request more than 8 MB. It's merely a recommendation though, and decoders are free to support larger or lower limits, depending on local limitations. is pretty clear about the motivation for limiting the Window_Size to 8 MB, and why an implementation might want to use a smaller Window_Size, but is there anything you could say about why an implementation might want to use a larger Window_Size value? In this text, 2.4. Entropy Encoding Two types of entropy encoding are used by the Zstandard format: FSE, and Huffman coding. could you give any guidance about why you might choose to use one format over another? Is the meaning of "under control of a third party" well understood? One should never compress together a message whose content must remain secret with a message under control of a third party. I might be able to guess at a precise definition, but I'd be guessing. I'm wondering if you really want to remove all of 5. Implementation Status [RFC EDITOR: Please remove this section prior to publication.] Source code for a C language implementation of a "Zstandard" compliant library is available at [ZSTD-GITHUB]. This implementation is production ready, implementing the full range of the specification. It is tested against security hazards, and widely deployed within Facebook infrastructure. given that this text 2.5. Dictionary Format (snip) However, dictionaries created by "zstd --train" in the reference implementation follow a specific format, described here. refers to what I'm assuming is the same reference implementation (but I can't be sure, because there's no reference pointer in the Section 2.5 usage). (I was on the IESG and balloted Yes for https://datatracker.ietf.org/doc/rfc7942/, so I understand that this says you delete Implementation Sections before publishing as an RFC, but I don't think pointers to a reference implementation fall into the same category as the typical "so far, X, Y, and Z have implemented this protocol" Implementation Sections that are instantly outdated. A pointer to a reference implementation sounds more useful for future readers. But, at a minimum, adding a reference pointer to the Section 2.5 occurrence would be useful, since that's the first time a reference implementation is mentioned) |
2018-05-18
|
01 | Spencer Dawkins | [Ballot Position Update] New position, No Objection, has been recorded for Spencer Dawkins |
2018-05-18
|
01 | Mirja Kühlewind | [Ballot comment] Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments: 1) What's the reason that this document is submitted as "Standards … [Ballot comment] Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments: 1) What's the reason that this document is submitted as "Standards Track"? In there a working group that is planning to use this mechanism? Or why does it need to be in the IETF/have IETF consensus? 2) I think this document would benefit from the use of normative language. 3) Why is there a normative reference to a website? I would think the document should be and is describing the compression mechanism comprehensively without the need to have a look at a website that might not even provide a stable reference. Sorry one more comment I forgot earlier: 4) Should the User_Data Frame be further discussed in the security section as it can carry arbitrary information which can be security-relevant or privacy sensitive...? |
2018-05-18
|
01 | Mirja Kühlewind | Ballot comment text updated for Mirja Kühlewind |
2018-05-18
|
01 | Mirja Kühlewind | [Ballot comment] Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments: 1) What's the reason that this document is submitted as "Standards … [Ballot comment] Unfortunately there is no shepherd write-up, therefore I have couple of questions/comments: 1) What's the reason that this document is submitted as "Standards Track"? In there a working group that is planning to use this mechanism? Or why does it need to be in the IETF/have IETF consensus? 2) I think this document would benefit from the use of normative language. 3) Why is there a normative reference to a website? I would think the document should be and is describing the compression mechanism comprehensively without the need to have a look at a website that might not even provide a stable reference. |
2018-05-18
|
01 | Mirja Kühlewind | [Ballot Position Update] New position, No Objection, has been recorded for Mirja Kühlewind |
2018-05-18
|
01 | Alexey Melnikov | IESG state changed to IESG Evaluation from Waiting for Writeup |
2018-05-18
|
01 | Alexey Melnikov | Ballot has been issued |
2018-05-18
|
01 | Alexey Melnikov | [Ballot Position Update] New position, Yes, has been recorded for Alexey Melnikov |
2018-05-18
|
01 | Alexey Melnikov | Created "Approve" ballot |
2018-05-18
|
01 | Alexey Melnikov | Ballot writeup was changed |
2018-04-26
|
01 | Susan Hares | Request for Last Call review by OPSDIR Completed: Has Issues. Reviewer: Susan Hares. Sent review to list. |
2018-04-23
|
01 | Alexey Melnikov | Placed on agenda for telechat - 2018-05-24 |
2018-04-23
|
01 | (System) | IESG state changed to Waiting for Writeup from In Last Call |
2018-04-20
|
01 | (System) | IANA Review state changed to IANA OK - Actions Needed from IANA - Review Needed |
2018-04-20
|
01 | Sabrina Tanamal | (Via drafts-lastcall@iana.org): IESG/Authors/WG Chairs: The IANA Services Operator has completed its review of draft-kucherawy-dispatch-zstd-01. If any part of this review is inaccurate, please let … (Via drafts-lastcall@iana.org): IESG/Authors/WG Chairs: The IANA Services Operator has completed its review of draft-kucherawy-dispatch-zstd-01. If any part of this review is inaccurate, please let us know. The IANA Services Operator understands that, upon approval of this document, there are two actions which we must complete. First, in the application registry on the Media Types registry page located at: https://www.iana.org/assignments/media-types/ a single, new media type will be registered as follows: Name: zstd Template: [ TBD-at-Registration ] Reference: [ RFC-to-be ] Second, in the HTTP Content Coding Registry on the Hypertext Transfer Protocol (HTTP) Parameters registry page located at: https://www.iana.org/assignments/http-parameters/ a single, new registration will be added as follows: Name: zstd Description: A stream of bytes compressed using the Zstandard protocol Reference: [ RFC-to-be ] The IANA Services Operator understands that these are the only actions required to be completed upon approval of this document. Note: The actions requested in this document will not be completed until the document has been approved for publication as an RFC. This message is meant only to confirm the list of actions that will be performed. Thank you, Sabrina Tanamal Senior IANA Services Specialist |
2018-04-19
|
01 | Tero Kivinen | Request for Last Call review by SECDIR Completed: Ready. Reviewer: Scott Kelly. |
2018-04-19
|
01 | Vijay Gurbani | Request for Last Call review by GENART Completed: Ready with Nits. Reviewer: Vijay Gurbani. Sent review to list. |
2018-04-05
|
01 | Jean Mahoney | Request for Last Call review by GENART is assigned to Vijay Gurbani |
2018-04-05
|
01 | Jean Mahoney | Request for Last Call review by GENART is assigned to Vijay Gurbani |
2018-04-04
|
01 | Gunter Van de Velde | Request for Last Call review by OPSDIR is assigned to Susan Hares |
2018-04-04
|
01 | Gunter Van de Velde | Request for Last Call review by OPSDIR is assigned to Susan Hares |
2018-03-29
|
01 | Tero Kivinen | Request for Last Call review by SECDIR is assigned to Scott Kelly |
2018-03-29
|
01 | Tero Kivinen | Request for Last Call review by SECDIR is assigned to Scott Kelly |
2018-03-26
|
01 | Amy Vezza | IANA Review state changed to IANA - Review Needed |
2018-03-26
|
01 | Amy Vezza | The following Last Call announcement was sent out (ends 2018-04-23): From: The IESG To: IETF-Announce CC: draft-kucherawy-dispatch-zstd@ietf.org, alexey.melnikov@isode.com Reply-To: ietf@ietf.org Sender: Subject: Last Call: … The following Last Call announcement was sent out (ends 2018-04-23): From: The IESG To: IETF-Announce CC: draft-kucherawy-dispatch-zstd@ietf.org, alexey.melnikov@isode.com Reply-To: ietf@ietf.org Sender: Subject: Last Call: (Zstandard Compression and The application/zstd Media Type) to Informational RFC The IESG has received a request from an individual submitter to consider the following document: - 'Zstandard Compression and The application/zstd Media Type' as Informational RFC The IESG plans to make a decision in the next few weeks, and solicits final comments on this action. Please send substantive comments to the ietf@ietf.org mailing lists by 2018-04-23. Exceptionally, comments may be sent to iesg@ietf.org instead. In either case, please retain the beginning of the Subject line to allow automated sorting. Abstract Zstandard, or "zstd" (pronounced "zee standard"), is a data compression mechanism. This document describes the mechanism, and registers a media type to be used when transporting zstd-compressed via Multipurpose Internet Mail Extensions (MIME). The file can be obtained via https://datatracker.ietf.org/doc/draft-kucherawy-dispatch-zstd/ IESG discussion can be tracked via https://datatracker.ietf.org/doc/draft-kucherawy-dispatch-zstd/ballot/ No IPR declarations have been submitted directly on this I-D. |
2018-03-26
|
01 | Amy Vezza | IESG state changed to In Last Call from Last Call Requested |
2018-03-26
|
01 | Amy Vezza | Last call announcement was changed |
2018-03-23
|
01 | Alexey Melnikov | Last call was requested |
2018-03-23
|
01 | Alexey Melnikov | Last call announcement was generated |
2018-03-23
|
01 | Alexey Melnikov | Ballot approval text was generated |
2018-03-23
|
01 | Alexey Melnikov | Ballot writeup was generated |
2018-03-23
|
01 | Alexey Melnikov | IESG state changed to Last Call Requested from AD Evaluation |
2018-03-23
|
01 | Alexey Melnikov | Changed consensus to Yes from Unknown |
2018-03-23
|
01 | Alexey Melnikov | IESG state changed to AD Evaluation from Publication Requested |
2018-03-18
|
01 | Alexey Melnikov | IETF WG state changed to Submitted to IESG for Publication |
2018-03-18
|
01 | Alexey Melnikov | IESG state changed to Publication Requested from AD is watching |
2017-11-12
|
01 | Murray Kucherawy | New version available: draft-kucherawy-dispatch-zstd-01.txt |
2017-11-12
|
01 | (System) | New version approved |
2017-11-12
|
01 | (System) | Request for posting confirmation emailed to previous authors: Yann Collet , Murray Kucherawy |
2017-11-12
|
01 | Murray Kucherawy | Uploaded new revision |
2017-11-12
|
00 | Alexey Melnikov | Assigned to Applications and Real-Time Area |
2017-11-12
|
00 | Alexey Melnikov | Responsible AD changed to Alexey Melnikov |
2017-11-12
|
00 | Alexey Melnikov | Intended Status changed to Informational |
2017-11-12
|
00 | Alexey Melnikov | IESG process started in state AD is watching |
2017-11-12
|
00 | Alexey Melnikov | Stream changed to IETF from None |
2017-11-12
|
00 | Murray Kucherawy | Added to session: IETF-100: dispatch Mon-0930 |
2017-09-25
|
00 | Murray Kucherawy | New version available: draft-kucherawy-dispatch-zstd-00.txt |
2017-09-25
|
00 | (System) | New version approved |
2017-09-25
|
00 | Murray Kucherawy | Request for posting confirmation emailed to submitter and authors: Yann Collet , Murray Kucherawy , "Murray S. Kucherawy" |
2017-09-25
|
00 | Murray Kucherawy | Uploaded new revision |