Skip to main content

Telechat Review of draft-ietf-payload-vp8-16

Request Review of draft-ietf-payload-vp8
Requested revision No specific revision (document currently at 17)
Type Telechat Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2013-07-16
Requested 2013-07-18
Authors Patrik Westin , Henrik Lundin , Michael Glover , Justin Uberti , Frank Galligan
I-D last updated 2013-09-11
Completed reviews Genart Last Call review of -08 by Elwyn B. Davies (diff)
Genart Telechat review of -09 by Elwyn B. Davies (diff)
Genart Telechat review of -16 by Elwyn B. Davies (diff)
Genart Last Call review of -17 by Elwyn B. Davies
Secdir Last Call review of -08 by Brian Weis (diff)
Assignment Reviewer Elwyn B. Davies
State Completed
Request Telechat review on draft-ietf-payload-vp8 by General Area Review Team (Gen-ART) Assigned
Reviewed revision 16 (document currently at 17)
Result Ready w/nits
Completed 2013-09-11
I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at


Please resolve these comments along with any other Last Call comments
you may receive.

Document: draft-ietf-payload-vp8-16.txt
Reviewer: Elwyn Davies
Review Date: 2015/07/10
IETF LC End Date: 2015/07/13
IESG Telechat date: (if known) -

Summary: Gosh, it has been a long time since the last review! Almost 

ready.  Still needs various minor nits fixed and some clarifications.

Major issues:

Minor issues:
s4.2: Various of the frame indices either SHOULD or, in the case of
KEYIDX, MAY start at random values.  The rationale(s) for these
requirements are not explained - nor is the reason why it it is only a
MAY for KEYIDX whereas it is SHOULD for PictureID and what might happen
were the MAY/SHOULD to be ignored.

Is the rationale something that should be mentioned in the security

s10.1:  The downref to RFC 6386 identified by idnits was duly noted in 

the last call

announcement.  I don't have a problem with the downref.

Nits/editorial comments:

General: Consistent usage of "prediction and mode" vs "prediction/mode" 

would be desirable.

 s1, para 2: Needs to introduce the 'macroblock' jargon defined in RFC 

6386. Suggest:

 VP8 is based on decomposition of frames into square sub-blocks of
 pixels, prediction of such sub-blocks using previously constructed
 blocks, and...
 VP8 is based on decomposition of frames into square sub-blocks of
 pixels known as "macroblocks" (see Section 2 of [RFC6386]).  Prediction
 of such sub-blocks using previously constructed blocks, and...

s2: There are a number of technical terms brought over from RFC 6386.
These should be listed in s2.  At least the following would be in the list:
key frame, interframe, golden frame, altref frame, macroblock.

Also the terms picture selection index (RPSI) and slice loss indication 

(SLI) from RFC 4585.

s3, para 2: Providing a reference to explain what temporal scalability
and the hierarchy selection is all about would be useful.  One possibility
would be:

Lim, J. Yang, and B. Jeon,”Fast Coding Mode Decision for Scalable Video 


10th Int’l Conf. on Advanced Communication Technology, vol. 3, pp. 

1897-1900, 2008.

It would also help to add a pointer to the TL0PICIDX description in 

s4.2, thus:


   Support for temporal scalability is provided by the optional 


fields described in Section 4.2.

ss3 and 4: The discussion of how the frame payload might or might not be 

distributed across

multiple packets is not very clear.   The draft has in mind a 'preferred 

use case' (para 1

of s4.4 says:

In the preferred use case, the S bit and PID fields
    described in Section 4.2 should be used to indicate what the packet

I think it would be helpful to be a bit more positive about this in s3, 

para 3:

   This memo
   allows the partitions to be sent separately or in the same RTP
   packet.  It may be beneficial for decoder error-concealment to send
   the partitions in different packets, even though it is not mandatory
   according to this specification.

Accordingly, this document RECOMMENDS that the frame is packetized by 

the sender

with each data partition in a separate packet or packets.  This may be 


for decoder error concealment and the payload format described in 

Section 4 provides

fields that allow the partitions to be identified even if the first 

partition is

not available.  The sender can, alternatively, aggregate the data 

partitions into

a single data stream and, optionally, split it into several packets without
consideration of the partition boundaries.  The receiver can use the length

information in the first partition to identify the partitions during 



s4/Figure 1: s/bytes/octets/ (2 places)

Figure 1, caption: s/sequel/Sections 4.2 and 4.3/

Figure 1: The format shown is correct only for the first packet in a 

frame.  Subsequent

packets will not contain the payload header and will have later octets 

of the frame payload.

This should be mentioned in drawing and/or in the caption.

s4.1, last two paras:

    Sequence number:  The sequence numbers are monotonically increasing
       and set as packets are sent.

       The remaining RTP header fields are used as specified in

Redefining the Sequence Number field seems gratuitous and potentially 


given that it is *very carefully* defined in RFC 3550 and the overall 

intent does

not seem any different from that in RFC 3550.  OTOH a note about the 

(non?-)use of the X bit

and the Payload Type field (PT) would be useful.  I suggest replacing 

the paras above with:


   X bit: MUST be 0.  The VP8 RTP profile does not use the RTP Header 

Extension capability.

   PT (Payload Type):  In line with the policy in Section 3 of 

[RFC3551], applications

      using the VP8 RTP payload profile MUST assign a dynamic payload 

type number to

      be used in each RTP session and provide a mechanism to indicate 

the mapping.

      See Section 6.2 for the mechanism to be used with the Session 

Description Protocol (SDP)


      The remaining RTP Fixed Header Fields (V, P, CC, sequence number, 


      identfiers) are used as specified in Section 5.1 of [RFC5330].

Note that this may be considered to make the reference to RFC 3551 


s4.2, X bit:  There is potential for confusion between the X bit in the 

fixed header

and the X bit in the VP8 Payload Descriptor.  Perhaps changing this to C 

for control bits

would avoid the problem.

s4.2, M bit: Similarly, it might be better to use B (for BIG) in place 

of M to avoid confusion.

s4.2, para 7 after Fig 2: For consistency:
   When the X bit is set to 1 in the first octet, the extension bit
   field octet MUST be provided as the second octet.  If the X bit is 0,
   the extension bit field octet MUST NOT be present, and no extensions
   (I, L, T, or K) are permitted.

   When the X [or C] bit is set to 1 in the first octet, the Extended 

Control Bits

   field octet MUST be provided as the second octet.  If the X [or C] 

bit is 0,

   the extension bit field octet MUST NOT be present, and no extensions
   (I, L, T, or K) are permitted.

s4.2: The issue of using either 'wrap' or 'restart' but not both for 

restarting number sequences has been raised in various comments by IESG 


s4.2, TL0PICIDX: I think, given the statements in s3 about not defining 

the hierarchy, that more explanation is needed of what the 'lowest or 

base layer' actually means.

s4.2, TL0PICIDX:

  If TID is larger than 0, TL0PICIDX
          indicates which base layer frame the current image depends on.
          TL0PICIDX MUST be incremented when TID is 0.

What happens if L=1 but both T=0 and K=0 so that there is no TID value 

present? Or indeed if T=0 but K=1 so that the TID field is there but 

'MUST be ignored by the receiver'  (definition of TID field)?

s4.3: It would be useful to include the section number (9.1) where the 

uncompressed data chunk specification is found.  I was somewhat 

surprised that the ordering is reversed in the protocol spec.

s4.3, VER: The receiver should validate this field - currently only 

values 0-3 are valid.

s4.3, SizeN:  Make it clear these are integer fields: s/in Size0,/in 

integer fields Size0,/

s4.3 and s4.4:  It would be helpful to implementers to explain what the 

offsets in the  first partition payload are for the additional partition 

count and additional partition sizes.  Having rummaged around in RFC 

6386, I have to say that I am unclear whether the values are placed 

after the full chunk of data indicated by 1stPartitionSize or are part 

of this data. And if the latter is the case, how to work out where the 

partition sizes are.  I guess that they ought to be at offset 

(1stPartitionSize+10 - key frame - or 1stPartitionSize+3 - interframe) 

in the VP8 Payload field as it is difficult to work out where they are 

without decoding the partition otherwise.  Also exactly how many 

partition sizes to expect - I think I read in RFC 6386 that the last 

partition size is implicit. Help!

In the course of clearing this up, it might be appropriate to move the 

last sentence of the first para of s4.4 and the second para of s4.4 into 

s4.3 as part of the explanation.  This is the relevant text:

    Note that the length of the first partition can
    always be obtained from the first partition size parameter in the VP8
    payload header.

    The VP8 bitstream format [RFC6386] specifies that if multiple DCT/WHT
    partitions are produced, the location of each partition start is
    found at the end of the first (prediction/mode) partition.  In this
    RTP payload specification, the location offsets are considered to be
    part of the first partition.

s4.4:  I found this section very difficult to parse.  It is a bit 

'stream of consciousness' and would benefit from a reordering and 

rewrite.  Suggestion (assuming the second para gets moved to s4.3):

   An encoded VP8 frame can be divided into two or more partitions, as
   described in Section 1.  It is OPTIONAL for a packetizer implementing
   this RTP specification
   to pay attention to the partition boundaries within an encoded frame.
   If packetization of a frame is done without considering the partition
   boundaries, the PID field MAY be set to zero for all packets, and the
   S bit MUST NOT be set to one for any other packet than the first.

   If the preferred usage suggested in Section 3 is followed, with each 


   carrying data from exactly one partition, the S bit and PID fields
   described in Section 4.2 SHOULD be used to indicate what the packet
   contains.  The PID field should indicate which partition the first
   octet of the payload belongs to, and the S bit indicates that the
   packet starts on a new partition.

   If the packetizer does not pay attention to the partition 

boundaries, one

   packet can contain a fragment of a  partition, a complete partition, 

or an

   aggregate of fragments and partitions.  There is no explicit 

signaling of

   partition boundaries in the payload and the partition lengths at the 

end of

   the first partition have to be used to identify the boundaries. 


   MUST be aggregated in decoding order.  Two fragments from different
   partitions MAY be aggregated into the same packet along with one or more
   complete partitions.

  In all cases, the payload of a packet MUST contain data from only one 


  frame.  Consequently the set of packets carrying the data from a 


  frame will contain exactly one VP8 Payload Header (see Section 4.3) 


  in the first packet of the frame.  The last, or only, packet carrying 

data for the

  frame MUST have the M bit set in the RTP header.

s4.5.1:  Th algorithm only applies for the RECOMMENDED use case with 

partitions in separate packets.

This should be noted.

s5.2, last para: s/only parts of the frame is corrupted./only part of 

the frame is corrupted./

s6: s/This payload format has two required parameters./This payload 

format has two optional parameters./

[I think this is probably a hangover from the previous version.]

s7: s/Cryptographic system may also allow/The cryptographic system may 

also allow/


the usage of SRTP [RFC3711] is recommended.


s10.2: I suspect that RFC 3551 is normative.