Skip to main content

Telechat Review of draft-ietf-payload-vp8-16
review-ietf-payload-vp8-16-genart-telechat-davies-2013-09-11-00

Request Review of draft-ietf-payload-vp8
Requested revision No specific revision (document currently at 17)
Type Telechat Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2013-07-16
Requested 2013-07-18
Authors Patrik Westin , Henrik Lundin , Michael Glover , Justin Uberti , Frank Galligan
I-D last updated 2013-09-11
Completed reviews Genart Last Call review of -08 by Elwyn B. Davies (diff)
Genart Telechat review of -09 by Elwyn B. Davies (diff)
Genart Telechat review of -16 by Elwyn B. Davies (diff)
Genart Last Call review of -17 by Elwyn B. Davies
Secdir Last Call review of -08 by Brian Weis (diff)
Assignment Reviewer Elwyn B. Davies
State Completed
Request Telechat review on draft-ietf-payload-vp8 by General Area Review Team (Gen-ART) Assigned
Reviewed revision 16 (document currently at 17)
Result Ready w/nits
Completed 2013-09-11
review-ietf-payload-vp8-16-genart-telechat-davies-2013-09-11-00
I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at

<

http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

Document: draft-ietf-payload-vp8-16.txt
Reviewer: Elwyn Davies
Review Date: 2015/07/10
IETF LC End Date: 2015/07/13
IESG Telechat date: (if known) -



Summary: Gosh, it has been a long time since the last review! Almost 


ready.  Still needs various minor nits fixed and some clarifications.




Major issues:

Minor issues:
s4.2: Various of the frame indices either SHOULD or, in the case of
KEYIDX, MAY start at random values.  The rationale(s) for these
requirements are not explained - nor is the reason why it it is only a
MAY for KEYIDX whereas it is SHOULD for PictureID and what might happen
were the MAY/SHOULD to be ignored.

Is the rationale something that should be mentioned in the security
considerations?



s10.1:  The downref to RFC 6386 identified by idnits was duly noted in 


the last call



announcement.  I don't have a problem with the downref.

Nits/editorial comments:



General: Consistent usage of "prediction and mode" vs "prediction/mode" 


would be desirable.






 s1, para 2: Needs to introduce the 'macroblock' jargon defined in RFC 


6386. Suggest:



OLD:
 VP8 is based on decomposition of frames into square sub-blocks of
 pixels, prediction of such sub-blocks using previously constructed
 blocks, and...
NEW:
 VP8 is based on decomposition of frames into square sub-blocks of
 pixels known as "macroblocks" (see Section 2 of [RFC6386]).  Prediction
 of such sub-blocks using previously constructed blocks, and...
END

s2: There are a number of technical terms brought over from RFC 6386.
These should be listed in s2.  At least the following would be in the list:
key frame, interframe, golden frame, altref frame, macroblock.


Also the terms picture selection index (RPSI) and slice loss indication 


(SLI) from RFC 4585.




s3, para 2: Providing a reference to explain what temporal scalability
and the hierarchy selection is all about would be useful.  One possibility
would be:



Lim, J. Yang, and B. Jeon,”Fast Coding Mode Decision for Scalable Video 


Coding”,


10th Int’l Conf. on Advanced Communication Technology, vol. 3, pp. 


1897-1900, 2008.






It would also help to add a pointer to the TL0PICIDX description in 


s4.2, thus:



ADD:


   Support for temporal scalability is provided by the optional 


TL0PICIDX and TID/Y/KEYIDX



fields described in Section 4.2.
END



ss3 and 4: The discussion of how the frame payload might or might not be 


distributed across


multiple packets is not very clear.   The draft has in mind a 'preferred 


use case' (para 1



of s4.4 says:




In the preferred use case, the S bit and PID fields
    described in Section 4.2 should be used to indicate what the packet
    contains.






I think it would be helpful to be a bit more positive about this in s3, 


para 3:



OLD:
   This memo
   allows the partitions to be sent separately or in the same RTP
   packet.  It may be beneficial for decoder error-concealment to send
   the partitions in different packets, even though it is not mandatory
   according to this specification.
NEW:


Accordingly, this document RECOMMENDS that the frame is packetized by 


the sender


with each data partition in a separate packet or packets.  This may be 


beneficial


for decoder error concealment and the payload format described in 


Section 4 provides


fields that allow the partitions to be identified even if the first 


partition is


not available.  The sender can, alternatively, aggregate the data 


partitions into



a single data stream and, optionally, split it into several packets without
consideration of the partition boundaries.  The receiver can use the length


information in the first partition to identify the partitions during 


decoding.



END

s4/Figure 1: s/bytes/octets/ (2 places)

Figure 1, caption: s/sequel/Sections 4.2 and 4.3/



Figure 1: The format shown is correct only for the first packet in a 


frame.  Subsequent


packets will not contain the payload header and will have later octets 


of the frame payload.



This should be mentioned in drawing and/or in the caption.

s4.1, last two paras:



    Sequence number:  The sequence numbers are monotonically increasing
       and set as packets are sent.

       The remaining RTP header fields are used as specified in
       [RFC3550].


Redefining the Sequence Number field seems gratuitous and potentially 


dangerous,


given that it is *very carefully* defined in RFC 3550 and the overall 


intent does


not seem any different from that in RFC 3550.  OTOH a note about the 


(non?-)use of the X bit


and the Payload Type field (PT) would be useful.  I suggest replacing 


the paras above with:



NEW:


   X bit: MUST be 0.  The VP8 RTP profile does not use the RTP Header 


Extension capability.






   PT (Payload Type):  In line with the policy in Section 3 of 


[RFC3551], applications


      using the VP8 RTP payload profile MUST assign a dynamic payload 


type number to


      be used in each RTP session and provide a mechanism to indicate 


the mapping.


      See Section 6.2 for the mechanism to be used with the Session 


Description Protocol (SDP)



      [RFC4566].



      The remaining RTP Fixed Header Fields (V, P, CC, sequence number, 


SSRC and CSRC



      identfiers) are used as specified in Section 5.1 of [RFC5330].
END


Note that this may be considered to make the reference to RFC 3551 


normative.






s4.2, X bit:  There is potential for confusion between the X bit in the 


fixed header


and the X bit in the VP8 Payload Descriptor.  Perhaps changing this to C 


for control bits



would avoid the problem.



s4.2, M bit: Similarly, it might be better to use B (for BIG) in place 


of M to avoid confusion.




s4.2, para 7 after Fig 2: For consistency:
OLD:
   When the X bit is set to 1 in the first octet, the extension bit
   field octet MUST be provided as the second octet.  If the X bit is 0,
   the extension bit field octet MUST NOT be present, and no extensions
   (I, L, T, or K) are permitted.
NEW:


   When the X [or C] bit is set to 1 in the first octet, the Extended 


Control Bits


   field octet MUST be provided as the second octet.  If the X [or C] 


bit is 0,



   the extension bit field octet MUST NOT be present, and no extensions
   (I, L, T, or K) are permitted.
END



s4.2: The issue of using either 'wrap' or 'restart' but not both for 


restarting number sequences has been raised in various comments by IESG 


members.






s4.2, TL0PICIDX: I think, given the statements in s3 about not defining 


the hierarchy, that more explanation is needed of what the 'lowest or 


base layer' actually means.




s4.2, TL0PICIDX:



  If TID is larger than 0, TL0PICIDX
          indicates which base layer frame the current image depends on.
          TL0PICIDX MUST be incremented when TID is 0.


What happens if L=1 but both T=0 and K=0 so that there is no TID value 


present? Or indeed if T=0 but K=1 so that the TID field is there but 


'MUST be ignored by the receiver'  (definition of TID field)?






s4.3: It would be useful to include the section number (9.1) where the 


uncompressed data chunk specification is found.  I was somewhat 


surprised that the ordering is reversed in the protocol spec.






s4.3, VER: The receiver should validate this field - currently only 


values 0-3 are valid.






s4.3, SizeN:  Make it clear these are integer fields: s/in Size0,/in 


integer fields Size0,/






s4.3 and s4.4:  It would be helpful to implementers to explain what the 


offsets in the  first partition payload are for the additional partition 


count and additional partition sizes.  Having rummaged around in RFC 


6386, I have to say that I am unclear whether the values are placed 


after the full chunk of data indicated by 1stPartitionSize or are part 


of this data. And if the latter is the case, how to work out where the 


partition sizes are.  I guess that they ought to be at offset 


(1stPartitionSize+10 - key frame - or 1stPartitionSize+3 - interframe) 


in the VP8 Payload field as it is difficult to work out where they are 


without decoding the partition otherwise.  Also exactly how many 


partition sizes to expect - I think I read in RFC 6386 that the last 


partition size is implicit. Help!






In the course of clearing this up, it might be appropriate to move the 


last sentence of the first para of s4.4 and the second para of s4.4 into 


s4.3 as part of the explanation.  This is the relevant text:



    Note that the length of the first partition can
    always be obtained from the first partition size parameter in the VP8
    payload header.

    The VP8 bitstream format [RFC6386] specifies that if multiple DCT/WHT
    partitions are produced, the location of each partition start is
    found at the end of the first (prediction/mode) partition.  In this
    RTP payload specification, the location offsets are considered to be
    part of the first partition.







s4.4:  I found this section very difficult to parse.  It is a bit 


'stream of consciousness' and would benefit from a reordering and 


rewrite.  Suggestion (assuming the second para gets moved to s4.3):



NEW SECTION 4.4:
   An encoded VP8 frame can be divided into two or more partitions, as
   described in Section 1.  It is OPTIONAL for a packetizer implementing
   this RTP specification
   to pay attention to the partition boundaries within an encoded frame.
   If packetization of a frame is done without considering the partition
   boundaries, the PID field MAY be set to zero for all packets, and the
   S bit MUST NOT be set to one for any other packet than the first.



   If the preferred usage suggested in Section 3 is followed, with each 


packet



   carrying data from exactly one partition, the S bit and PID fields
   described in Section 4.2 SHOULD be used to indicate what the packet
   contains.  The PID field should indicate which partition the first
   octet of the payload belongs to, and the S bit indicates that the
   packet starts on a new partition.



   If the packetizer does not pay attention to the partition 


boundaries, one


   packet can contain a fragment of a  partition, a complete partition, 


or an


   aggregate of fragments and partitions.  There is no explicit 


signaling of


   partition boundaries in the payload and the partition lengths at the 


end of


   the first partition have to be used to identify the boundaries. 


Partitions



   MUST be aggregated in decoding order.  Two fragments from different
   partitions MAY be aggregated into the same packet along with one or more
   complete partitions.



  In all cases, the payload of a packet MUST contain data from only one 


video


  frame.  Consequently the set of packets carrying the data from a 


particular


  frame will contain exactly one VP8 Payload Header (see Section 4.3) 


carried


  in the first packet of the frame.  The last, or only, packet carrying 


data for the



  frame MUST have the M bit set in the RTP header.



s4.5.1:  Th algorithm only applies for the RECOMMENDED use case with 


partitions in separate packets.



This should be noted.



s5.2, last para: s/only parts of the frame is corrupted./only part of 


the frame is corrupted./






s6: s/This payload format has two required parameters./This payload 


format has two optional parameters./



[I think this is probably a hangover from the previous version.]



s7: s/Cryptographic system may also allow/The cryptographic system may 


also allow/




s7:



the usage of SRTP [RFC3711] is recommended.



RECOMMENDED?

s10.2: I suspect that RFC 3551 is normative.