RTP Payload Format for VP9 Video

Note: This ballot was opened for revision 13 and is now closed.

Murray Kucherawy Yes

Comment (2021-06-03 for -13)
There is a normative reference to a non-SDO document that was not specifically identified during the IETF Last Call.  BCP 97 doesn't give specific guidance about handling a document that was not produced by any SDO.

The IESG discussed this and chose to approve it without a second Last Call, since in this case the reference appears to be in good shape and stable.  The IESG will also take up an action item to amend BCP 97 to include guidance for this pattern to ease handling of future cases.

Alvaro Retana No Objection

Benjamin Kaduk (was Discuss) No Objection

Comment (2021-06-03 for -14)
Thanks for addressing my Discuss point, and comments!
I spotted a few maybe nits in the -14 while reviewing the diff.

Section 3

   A Picture Group is a recurring pattern of spatial and temporal
   dependencies which In this mode, each packet has an index to refer to

Looks like an editing remnant up to "In this mode".

Section 4.2

   I:  Picture ID (PID) present.  When set to one, the OPTIONAL PID MUST
      be present after the mandatory first octet and specified as below.
      Otherwise, PID MUST NOT be present.  If the V bit was set in the
      stream's most recent start of a keyframe (i.e. the SS field was
      present, and non-flexible scalability mode is in use), then this
      bit MUST be set on every packet.

I may (still?) be confused, but I think the "non-flexible scalability mode is
in use" belongs outside the parentheses, as it's an additional separate
precondition from "V bit is set" for "this bit MUST be set".  That is, IIUC,
non-flexible scalability mode requires SS and requires PIDs in every packet,
but it's permitted to send SS in flexible scalability mode, and in the latter
case it's permitted (but unexpected?) to omit PIDs from (some) packets.

Section 4.2.1

   In a scalable stream sent with a fixed pattern, the SS data SHOULD be
   included in the first packet of every key frame.  This is a packet
   with P bit equal to zero, SID or Lis not the bit equal to zero, and B
   bit equal to 1.  [...]

I think there's some editing churn here as well, at least a space in "L is"
but possibly more qualifiers about SID being zero vs L being the bit equal to

Erik Kline No Objection

Francesca Palombini No Objection

Comment (2021-06-01 for -13)
Thank you for the work on this document.

I have some non blocking questions and comments, which I hope will help improve the document.


1. -----

   Timestamp:  The RTP timestamp indicates the time when the input frame
      was sampled, at a clock rate of 90 kHz.  If the input picture is
      encoded with multiple layer frames, all of the frames of the
      picture MUST have the same timestamp.

FP: I think it would be useful to add a reference to RFC 3550, regarding "RTP timestamp". Also, I find it curious that RFC 3550 is not mentioned up to the end of section 4.1 (I would think a reference to it would be present in the introduction)

2. -----

      Otherwise, PID MUST NOT be present.  If the SS field was present
      in the stream's most recent start of a keyframe (i.e., non-
      flexible scalability mode is in use), then the PID MUST also be
      present in every packet.

FP: Is there any reason why this is not formulated in terms of V bit being set? (I believe the rest of the text is consistently talking about bit being set)

3. -----

      described by "Reference indices" below.  This MUST only be set to
      1 if the I bit is also set to one; if the I bit is set to zero,
      then this MUST also be set to zero and ignored by receivers.  The

FP: Why is that the it MUST only be set to 1 if I is also set to 1? I was looking for the motivation, but could not find it. Some more text would have been helpful to me.

4. -----

   Z:  Not a reference frame for upper spatial layers.  If set to 1,
      indicates that frames with higher spatial layers SID+1 of the
      current and following pictures do not depend on the current

FP: I am not sure if the text it meant to say higher spatial layers than SID+1 (inclusive?)

5. -----

     The field MUST be present if the I bit is equal to one.  If set,
      the PID field MUST contain 15 bits; otherwise, it MUST contain 7

FP: "If set" - I understand by the context this should be "If M is set" (how it's written now it could be interpreted by "if the PID field is set", which does not make sense, but better be clear)

6. -----

      or 15-bit index.  The PID SHOULD start on a random number, and
      MUST wrap after reaching the maximum ID (0x7f or 0x7fff depending
      on the index size chosen).  The receiver MUST NOT assume that the

FP: So is the intention that the PID is increased by one for each picture? Does the order matter? The way the text is written "reaching the maximum ID" would suggest so, but I could not find any text about that, if I have missed it please let me know.

7. -----

       SID-1 frame of the same picture, otherwise MUST set to zero.

FP: s/MUST set/MUST be set

8. -----

         depends on.  TL0PICIDX MUST be incremented when TID is equal to
         0.  The index SHOULD start on a random number, and MUST restart

FP: Does it matter by how much? If so, it should be stated.

9. -----

      temporal layer ID (TID), switch up point (U), and the R reference
      indices (P_DIFFs) are specified.

FP: I couldn't find the R bit defined anywhere. I assume its meaning is "if set, P_DIFF is present" but this should be clearly stated in the text.

10. -----

FP: Please expand MCU, LRR on first use

11. -----

Section 7. IANA

FP: I checked the mailarchive for the subtype registration and could not find it. I leave it to Murray to let me know if we are more lenient about subtype requests, but I would have appreciated the registration being posted to the media-types mailing list.

Lars Eggert (was Discuss) No Objection

Comment (2021-05-27 for -13)
No email
send info
It's very unfortunate that VP9 isn't published as an RFC as VP8 was -
I'm somewhat concerned about the stability of this reference, esp. given its
importance. Then again, it's apparently been accessible since 2016.

All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 3, paragraph 14, nit:
-    in video coding, i.e. to mean an independently-decoadable run of
-                                                       -
+    in video coding, i.e. to mean an independently-decodable run of

Section 5.3, paragraph 5, nit:
-    ingnored on reception.  See Section 4.2 for details on the TID and
-     -
+    ignored on reception.  See Section 4.2 for details on the TID and

Section 6.1.2, paragraph 5, nit:
-       its declared receiver capabilties.
+       its declared receiver capabilities.
+                                    +

Section 3, paragraph 12, nit:
>  document, is not the same thing as a the term "Group of Pictures" as it is t
>                                     ^^^^^
Maybe you need to remove one determiner so that only "a" or "the" is left.

Section 4.2, paragraph 6, nit:
> s present for the layer indices. Otherwise if the F bit is set to 0 (indicat
>                                  ^^^^^^^^^
Did you forget a comma after a conjunctive/linking adverb?

Section 4.2, paragraph 15, nit:
> d in this specification is different than a VP9 Superframe. All frames of the
>                                      ^^^^
Did you mean 'different "from"? 'Different than' is often considered colloquial

Section 4.5.1, paragraph 3, nit:
> ble frames are being referenced. Therefore it's recommended for both the fle
>                                  ^^^^^^^^^
Did you forget a comma after a conjunctive/linking adverb?

Section 6.1.1, paragraph 5, nit:
> TP in general. This responsibility lays on anyone using RTP in an application
>                                    ^^^^^^^
Did you mean "lies on"?

These URLs in the document can probably be converted to HTTPS:
 * http://www.iana.org/assignments/rtp-parameters

Martin Duke No Objection

Martin Vigoureux No Objection

Robert Wilton No Objection

Roman Danyliw No Objection

Comment (2021-06-09 for -15)
No email
send info
Thank you to Rifaat Shekh-Yusef for the SECDIR review.

Thank you for addressing my COMMENTs.

Warren Kumari No Objection

Comment (2021-06-02 for -13)
No email
send info
Thank you for this document. I'll happily note that there is lots in it that I don't understand (and needs lots of background knowledge), but it seems fine from an Ops side.

Zaheduzzaman Sarker (was Discuss) No Objection

Comment (2021-06-08 for -15)
Thanks for addressing my discuss.

Éric Vyncke No Objection

Comment (2021-05-30 for -13)
Thank you for the work put into this document. 

I have only one minor comment about section 4.1: please expand "VP9 pyld hdr" somewhere in the text.