FFV1 Video Coding Format Versions 0, 1, and 3
draft-ietf-cellar-ffv1-20
Yes
Murray Kucherawy
No Objection
Warren Kumari
(Deborah Brungard)
Abstain
Note: This ballot was opened for revision 17 and is now closed.
Murray Kucherawy
Yes
Erik Kline
No Objection
Comment
(2020-09-21 for -17)
Sent
[[ comments ]] [ section 3.3 ] * Suggest either defining colorspace_type value and coder_type values before using them, or provide a reference in this section (i.e. just include a reference to sections 4.2.3 and 4.2.5). [[ nits ]] [ section 3.2 ] * In the description of Figure 3: "relatively positions" -> "relatively positioned", I think. [ section 3.3 ] * "signed 16-bit signed integer": choose any one instance of 'signed', I'd say [ section 3.7.2 ] * "and then if used transparency" -> "and then, if used, transparency"
Roman Danyliw
(was Discuss, No Objection)
No Objection
Comment
(2020-12-01 for -19)
Sent for earlier
(Sorry for any confusion with the updated ballot. I didn't cut-and-paste all of the text. COMMENT is the same. DISCUSS is added.) Thanks for responding to the SECDIR feedback (and thank you to Liang Xia for providing the review). I support Barry Lieba’s Discuss position. Thanks for addressing my DISCUSS. On the security considerations: ** Section 6. Per the reference to [RFC4732], which selection is relevant here? Is it Section 2.1.1? ** Section 6. The assertions about the security properties of [REFIMPL] don’t make sense to me in this document. While it is extremely helpful that there is a high-quality reference implementation, it’s relevance to this spec isn’t clear. This code isn’t normative. Recommend removal all text after the paragraph “None of the content carried in FFV1 is intended to be executable”.
Warren Kumari
No Objection
Éric Vyncke
No Objection
Comment
(2020-10-05 for -17)
Sent
Thank you for the work put into this document. I found it fascinating while it describes the codec algorithms. Please find below a couple of non-blocking COMMENT points. I also support Alissa's point about using figure numbers/legends. I hope that this helps to improve the document, Regards, -éric == COMMENTS == -- Abstract -- May I suggest to expand/explain FFV1 on first use ? Still wondering what FFV1 means BTW... Also, the lossless encoding is probably the key property of this codec and should probably be mentioned in the abstract. -- Section 2 -- In an informational document, the use of BCP 14 is not required. Suggest to remove this part. -- Section 7 -- Should it be included in section 8 rather than referenced to?
Alissa Cooper Former IESG member
No Objection
No Objection
(2020-09-23 for -17)
Sent
Glad to see this work finishing up. I think it would be better if each figure were explained in the text (e.g., "..., as shown in Figure 9"), or if the figure numbers were dropped.
Alvaro Retana Former IESG member
No Objection
No Objection
(2020-10-05 for -17)
Sent
The datatracker should indicate that this draft replaces draft-niedermayer-cellar-ffv1. The indication appears on the tools page, but not the datatracker.
Barry Leiba Former IESG member
(was Discuss)
No Objection
No Objection
(2021-02-23)
Sent
Thanks very much for having addressed all my comments, and special thanks for the work in clearing up the meanings of things in Section 3.8.1.1. I think I do understand it now, and I think the work you've done in making that happen has made this a better document.
Deborah Brungard Former IESG member
No Objection
No Objection
(for -17)
Not sent
Martin Duke Former IESG member
No Objection
No Objection
(2020-10-02 for -17)
Not sent
I found this quite difficult to follow. An introductory section that explains the overall framework and how the various components fit in it would be very helpful.
Robert Wilton Former IESG member
No Objection
No Objection
(2020-10-05 for -17)
Sent
Hi, Thank you for taking the time to document FFV1 version 0, 1 and 3. I support Barry's discuss in that I found this document hard to read and interpret. I think that I would struggle to implement a FFV1 encoder/decoded from scratch based on this document. However, this is a long way outside my area of expertise and there is perhaps a corpus of basic video codec knowledge that is assumed in this specification. Is the intention of this document that it gets obsoleted when FFV1 version 4 is documented? I haven't reviewed the entirety of this document, but I do have some comments of particular areas of the document that I found hard to follow that additional text or explanation may be helpful. Overall, having some more introduction text explaining the overall structure of the encoding , i.e. how the different parts fit together would likely help readability. 3.1. Border Figure 2: A depiction of FFV1's assumed border for a set example Samples. I wasn't sure whether an extra row at the bottom of this table would have been helpful, but perhaps it is not required because it is not referenced. 3.2. Samples The labels for these relative "Samples" are made of the first letters of the words Top, Left and Right. Don't feel obliged to change this, but I wonder whether keep lowercase for all of the relative positions might have been clearer. E.g., perhaps using "tt" instead of "T" and "ll" instead of "L". 3.3. Median Predictor Exception for the median predictor ... Possibly putting the exception text into a 3.3.1 sub-section would aid readability. 3.4. Quantization Table Sets It wasn't clear to me what a "Quantized Sample Differences" is. 3.7.2. RGB Cb = b - g Cr = r - g Y = g + (Cb + Cr) >> 2 g = Y - (Cb + Cr) >> 2 r = Cr + g b = Cb + g Perhaps split into two sets of 3 equations to define the relationship in either direction. Exception for the JPEG2000-RCT conversion ... Again, putting this into a sub-section (3.7.2.1) might aid readability, i.e. the split between what is desired vs what is being described due to bugs in real implementations. 3.8. Coding of the Sample Difference coder_input = [(sample_difference + 2 ^ (bits - 1)) & (2 ^ bits - 1)] - 2 ^ (bits - 1) It wasn't clear to me what [] brackets meant here. 3.8.1.1. Range Binary Values I found this hard to follow, as in I couldn't figure out what it means. 3.8.1.4. State Transition Table It wasn't really clear to me what these were used for. 3.8.2.1. Signed Golomb Rice Codes Unclear what is meant by "ESC case" Regards, Rob
Benjamin Kaduk Former IESG member
Abstain
Abstain
(2020-10-07 for -18)
Sent
I support Roman's Discuss. I am balloting Abstain in that I am not confident that this document has received sufficient review, but have not identified specific flaws or areas of uncertainty that sould justify blocking publication, and accordingly do not wish to block its publication. In particular, I am not confident that I myself understand how all the pieces fit together -- even though some of my comments below might indicate internal inconsistencies or other issues that would justify a Discuss position, I am not fully confident in that, nor am I confident that I could judge that they have been adequately addressed by any proposed changes to the document's text. (This is in large part related to Barry's Discuss.) That said, my review comments on the document appear below. It might make sense to have a bit of discussion on what types of things would (or would not) change in a new micro version, vs new "macro" version, and what could be said to be "invariants" of the format overall. Should we give more treatment (or even an IANA registry) for how extensibility will be achieved for (e.g.) new "coder_type" and "colorspace_type" values? Section 2.2.9.4 "get_bits( i )" is the action to read the next "i" bits in the bitstream, from most significant bit to least significant bit, and to return the corresponding value. [...] [I guess this "value" is always in the form of an unsigned integer.] Section 3.3 It is expected (to be confirmed) to remove this exception for the median predictor in the next version of the FFV1 bitstream. When can/will the confirmation process happen? When version 4 is finalized? Section 3.6 The discussion seems to imply that the chroma planes are not used in FFV4 (and so the check for "version <= 3" in the logic here is just for future compatibility); is that correct? Section 3.7 color space. In YCbCr for each "Plane", each "Line" is coded from top to bottom and for each "Line", each "Sample" is coded from left to right. In JPEG2000-RCT for each "Line" from top to bottom, each Is there a reason that we use "JPEG2000-RCT" in the prose here, but "RGB" for the section 3.7.2 heading? Section 3.8.1.1 The defined terms do not cover all terms used (so I conclude that there are unnamed intermediate values) and the formulae themself do not do a great job of conveying where the overall inputs/outputs are, thus the flow of the encoding logic. Captions on the figures might help, which could include indications of which intermediate values from from formula to formula. It also seems like some of the intermediate values could be usefully named, e.g., R_(i) seems to serve the role of the remaining range at a given step. Section 3.8.1.2 Does "non binary" mean "non-integer", i.e., floating point? We don't seem to use the "non binary" term elsewhere in the document. I'm also not sure I'm reading the formula properly -- state is a uint8_t but the indexing into it seems to make more sense to me if treated as bit indexing, not byte indexing. (E.g., 'a' is only doubled when iterating through 22..31, indicating bitwise representation.) int e = 0; while (get_rac(c, state + 1 + min(e, 9)) { //1..10 e++; } It looks like this enters an infinite loop if get_rac(c, state + 10) returns true; is there something to enforce that this does not happen? "get_rac" returns a boolean, computed from the bytestream as described in Section 3.8.1.1. I suggest saying a little bit more about how this works, e.g., that the boolean is a single-bit integer value, and how this functional notation maps to the interfaces of the expressions in Section 3.8.1.1. Section 3.8.2.1 What is 'bits' (the argument to the last get_bits() call) in Figure 20? (Is it the same "bits_per_raw_sample" from §3.8? If so, that description should be qualified to apply more broadly than just "the equation below".) Section 3.8.2.2.1 What are 'x' and 'w' (and how are they set)? What are the semantics of "run_mode" values 1 vs 2? Section 3.8.2.3 I guess it's implicit that "output_number" is of the appropriate (width) type for the desired output bit width. Section 3.8.2.4 I guess "i += i" vs "i *= 2" is just a matter of style. Where does the 'bits' input to sign_extend() come from? (Is it the same "bits_per_raw_sample" from §3.8? If so, that description should be qualified to apply more broadly than just "the equation below".) Section 3.8.2.4.1 I suggest rewording or using a section reference for "normal difference coding" as the phrase "difference coding" does not suffice (via text search) to find the referenced procedure, since this is the only instance of the phrase in the document. Section 4.1 I'm not sure how to parse "number of equal entries -1". In combination with the example I can guess what is meant, but I still can't parse the text. The "Table" listed in the example does not seem to conform to "second half [...] is identical to the first with flipped sign". Also, the QuantizationTable() pseudocode suggests that the Table should have 256 entries, but this example only has 16. (That may be fine for example purposes, but the difference should be called out.) Section 4.2 C uses 0-indexed arrays, but the pseudocode seems to use 1-indexed arrays (e.g., state_transition_delta[]). Perhaps this is noteworthy. Section 4.5 If 'reserved' is only present when version <=1, it seems unlikely that a revision of this specification would assign semantics to it. Section 4.6.x nit: "if not present" could be taken to suggest that this field is optional at field-level granularity (vs. at header-/version-level granularity); "if header not present" might be slightly more clear. Section 6 If we are attempting to differentiate between "format" and "codec", are the uses of "codec" in this section all correct? It may (or may not) be worth calling out the importance of internal consistency within an implementation as to whether the Parameters() appear in the ConfigurationRecord() or in each Frame(). In all of the conditions above, the decoder and encoder was run inside the [VALGRIND] memory debugger as well as clangs address sanitizer [Address-Sanitizer], which track reads and writes to nit: "clang's" with apostrophe. Section 10 I don't see a particular reason that RFC 6716 needs to be a normative reference. (I also agree with Roman that there doesn't seem to be need to reference both C90 and C18.) Appendix A, B It is surprising to see BCP 14 keywords in these nominally informative sections.