RTP Payload Format for VP9 Video
Note: This ballot was opened for revision 13 and is now closed.
Murray Kucherawy Yes
Comment (2021-06-03 for -13)
There is a normative reference to a non-SDO document that was not specifically identified during the IETF Last Call. BCP 97 doesn't give specific guidance about handling a document that was not produced by any SDO. The IESG discussed this and chose to approve it without a second Last Call, since in this case the reference appears to be in good shape and stable. The IESG will also take up an action item to amend BCP 97 to include guidance for this pattern to ease handling of future cases.
Alvaro Retana No Objection
Benjamin Kaduk (was Discuss) No Objection
Comment (2021-06-03 for -14)
Thanks for addressing my Discuss point, and comments! I spotted a few maybe nits in the -14 while reviewing the diff. Section 3 A Picture Group is a recurring pattern of spatial and temporal dependencies which In this mode, each packet has an index to refer to Looks like an editing remnant up to "In this mode". Section 4.2 I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST be present after the mandatory first octet and specified as below. Otherwise, PID MUST NOT be present. If the V bit was set in the stream's most recent start of a keyframe (i.e. the SS field was present, and non-flexible scalability mode is in use), then this bit MUST be set on every packet. I may (still?) be confused, but I think the "non-flexible scalability mode is in use" belongs outside the parentheses, as it's an additional separate precondition from "V bit is set" for "this bit MUST be set". That is, IIUC, non-flexible scalability mode requires SS and requires PIDs in every packet, but it's permitted to send SS in flexible scalability mode, and in the latter case it's permitted (but unexpected?) to omit PIDs from (some) packets. Section 4.2.1 In a scalable stream sent with a fixed pattern, the SS data SHOULD be included in the first packet of every key frame. This is a packet with P bit equal to zero, SID or Lis not the bit equal to zero, and B bit equal to 1. [...] I think there's some editing churn here as well, at least a space in "L is" but possibly more qualifiers about SID being zero vs L being the bit equal to zero.
Erik Kline No Objection
Francesca Palombini No Objection
Comment (2021-06-01 for -13)
Thank you for the work on this document. I have some non blocking questions and comments, which I hope will help improve the document. Francesca 1. ----- Timestamp: The RTP timestamp indicates the time when the input frame was sampled, at a clock rate of 90 kHz. If the input picture is encoded with multiple layer frames, all of the frames of the picture MUST have the same timestamp. FP: I think it would be useful to add a reference to RFC 3550, regarding "RTP timestamp". Also, I find it curious that RFC 3550 is not mentioned up to the end of section 4.1 (I would think a reference to it would be present in the introduction) 2. ----- Otherwise, PID MUST NOT be present. If the SS field was present in the stream's most recent start of a keyframe (i.e., non- flexible scalability mode is in use), then the PID MUST also be present in every packet. FP: Is there any reason why this is not formulated in terms of V bit being set? (I believe the rest of the text is consistently talking about bit being set) 3. ----- described by "Reference indices" below. This MUST only be set to 1 if the I bit is also set to one; if the I bit is set to zero, then this MUST also be set to zero and ignored by receivers. The FP: Why is that the it MUST only be set to 1 if I is also set to 1? I was looking for the motivation, but could not find it. Some more text would have been helpful to me. 4. ----- Z: Not a reference frame for upper spatial layers. If set to 1, indicates that frames with higher spatial layers SID+1 of the current and following pictures do not depend on the current FP: I am not sure if the text it meant to say higher spatial layers than SID+1 (inclusive?) 5. ----- The field MUST be present if the I bit is equal to one. If set, the PID field MUST contain 15 bits; otherwise, it MUST contain 7 FP: "If set" - I understand by the context this should be "If M is set" (how it's written now it could be interpreted by "if the PID field is set", which does not make sense, but better be clear) 6. ----- or 15-bit index. The PID SHOULD start on a random number, and MUST wrap after reaching the maximum ID (0x7f or 0x7fff depending on the index size chosen). The receiver MUST NOT assume that the FP: So is the intention that the PID is increased by one for each picture? Does the order matter? The way the text is written "reaching the maximum ID" would suggest so, but I could not find any text about that, if I have missed it please let me know. 7. ----- SID-1 frame of the same picture, otherwise MUST set to zero. FP: s/MUST set/MUST be set 8. ----- depends on. TL0PICIDX MUST be incremented when TID is equal to 0. The index SHOULD start on a random number, and MUST restart FP: Does it matter by how much? If so, it should be stated. 9. ----- temporal layer ID (TID), switch up point (U), and the R reference indices (P_DIFFs) are specified. FP: I couldn't find the R bit defined anywhere. I assume its meaning is "if set, P_DIFF is present" but this should be clearly stated in the text. 10. ----- FP: Please expand MCU, LRR on first use 11. ----- Section 7. IANA FP: I checked the mailarchive for the subtype registration and could not find it. I leave it to Murray to let me know if we are more lenient about subtype requests, but I would have appreciated the registration being posted to the media-types mailing list.
Lars Eggert (was Discuss) No Objection
Comment (2021-05-27 for -13)
It's very unfortunate that VP9 isn't published as an RFC as VP8 was - I'm somewhat concerned about the stability of this reference, esp. given its importance. Then again, it's apparently been accessible since 2016. ------------------------------------------------------------------------------- All comments below are about very minor potential issues that you may choose to address in some way - or ignore - as you see fit. Some were flagged by automated tools (via https://github.com/larseggert/ietf-reviewtool), so there will likely be some false positives. There is no need to let me know what you did with these suggestions. Section 3, paragraph 14, nit: - in video coding, i.e. to mean an independently-decoadable run of - - + in video coding, i.e. to mean an independently-decodable run of Section 5.3, paragraph 5, nit: - ingnored on reception. See Section 4.2 for details on the TID and - - + ignored on reception. See Section 4.2 for details on the TID and Section 6.1.2, paragraph 5, nit: - its declared receiver capabilties. + its declared receiver capabilities. + + Section 3, paragraph 12, nit: > document, is not the same thing as a the term "Group of Pictures" as it is t > ^^^^^ Maybe you need to remove one determiner so that only "a" or "the" is left. Section 4.2, paragraph 6, nit: > s present for the layer indices. Otherwise if the F bit is set to 0 (indicat > ^^^^^^^^^ Did you forget a comma after a conjunctive/linking adverb? Section 4.2, paragraph 15, nit: > d in this specification is different than a VP9 Superframe. All frames of the > ^^^^ Did you mean 'different "from"? 'Different than' is often considered colloquial style. Section 4.5.1, paragraph 3, nit: > ble frames are being referenced. Therefore it's recommended for both the fle > ^^^^^^^^^ Did you forget a comma after a conjunctive/linking adverb? Section 6.1.1, paragraph 5, nit: > TP in general. This responsibility lays on anyone using RTP in an application > ^^^^^^^ Did you mean "lies on"? These URLs in the document can probably be converted to HTTPS: * http://www.iana.org/assignments/rtp-parameters
Martin Duke No Objection
Martin Vigoureux No Objection
Robert Wilton No Objection
Roman Danyliw No Objection
Comment (2021-06-09 for -15)
Thank you to Rifaat Shekh-Yusef for the SECDIR review. Thank you for addressing my COMMENTs.
Warren Kumari No Objection
Comment (2021-06-02 for -13)
Thank you for this document. I'll happily note that there is lots in it that I don't understand (and needs lots of background knowledge), but it seems fine from an Ops side.
Zaheduzzaman Sarker (was Discuss) No Objection
Comment (2021-06-08 for -15)
Thanks for addressing my discuss.
Éric Vyncke No Objection
Comment (2021-05-30 for -13)
Thank you for the work put into this document. I have only one minor comment about section 4.1: please expand "VP9 pyld hdr" somewhere in the text. -éric