Note: This ballot was opened for revision 06 and is now closed.
Thanks for the clear Security Considerations section. I am a little uncertain about the privacy properties of the padding data, though, largely due to my uncertainty about the details of how the padding works. (This is, perhaps, in a similar vein to Eric's general concerns on interoperability.) In particular, Section 4.2 says that the Data Length for a Padding Data Unit "may have any value" and "indicates the size of the recommended padding". There is also an "Optional Payload Data" in Figure 6, and I failed to find a description of what its contents are for padding data units. Section 4.5.1's coverage of the Padding Data Parse Info Header suggests that the "native VC-2" and RTP padding elements are essentially distinct, with the RTP one being essentially a recommendation to add a VC-2 one, but giving no mandatory guidance on how much padding to apply. In this scenario I don't know what the purpose of the "optional payload data" in Figure 6 be, though. Padding is of course ignored by the actual VC-2 decoder, so the concern would mostly be if the (RTP) bits on the wire include a side channel or "stegangraphic channel" (not exactly the normal meaning of that one) where identifying information could be inserted, unbeknownst to the recipient. This could come into play if media encryption is not used, or when a middlebox/mixer is used, or probably in other scenarios as well. The specification of all-zeros padding along with the Padding Data Parse Info Header of course removes any such channel at that point, but I didn't see a real confirmation that there was no channel in the RTP bits on the wire. Some additional section-by-section comments follow. Section 4.1 Timestamp: 32 bits If the packet contains transform parameters or coded slice data for a coded picture then the timestamp corresponds to the sampling instant of the coded picture. A 90kHz clock SHOULD be used. A single RTP packet MUST NOT contain coded data for more than one coded picture, so there is no ambiguity here. Is this a new requirement or quoting a preexisting one? If a new requirement, I suggest replacing "so there is no ambiguity here" with "in order to eliminate any chance for ambiguity". Section 4.2 If the receiver does not receive a transform parameters packet for a picture then it MAY assume that the parameters are unchanged since the last picture, or MAY discard the picture. Those seem like two very different options! How would I choose between them? We only get the starting x- and y-coordinates of a slice for the first slice in a packet; it sounds like the main VC-2 spec specifies the order in which the following slices are laid out? (Do we need to say anything about what "x coordinate" and "y coordinate" mean? I seem to recall that over history there have existed pixel identifying schemes that put the origin at both the top left and bottom left of the display.) Section 4.5.1 There is some text here and elsewhere that seems to imply reusing a Parse Info Header for various data received in different RTP packets, potentially even from different coded pictures. The Parse Info Header contains "next" and "prev parse offset"s, though -- when would those offsets need to be updated? o A receiver MAY combine all fragment data units (with parse code 0xEC) and the same picture number into a single picture data unit with parse code 0xE8. If the stream is required to comply with major versions 1 or 2 of the VC-2 Spec then this MUST be done. The "and" in "and the same picture number" seems to be an editing error; maybe "with" is better? o Once a data unit has been assembled, whether a Sequence Header, Coded Picture Fragment, Coded Picture, or Auxiliary Data Unit, the next parse offset and previous parse offset values in its Parse Info Header should be filled with the offset between the start of the header and the start of the next or previous. This text could probably be tightened up with respect to which next/previous fields are updated when, and what values go in them. E.g., do we have enough information to fill in the "previous" field when we start assembling a data unit, and the "next" when we finish assembling that data unit? Section 6.1 Perhaps "RFC XXXX" makes more sense as the "published specification" than ST 2042-1? That is, this is where the (mandatory) RTP framing is required, so it may be a better starting point for an implementor.
Rich version of this review at: https://mozphab-ietf.devsvcdev.mozaws.net/D4183 I am not sure that this specification can be interoperably implemented. I have noted a number of points below. I believe these are largely minor and so have not made this a DISCUSS, but it is important they be resolved. This document would also benefit from significant editorial work. Based on S 4.5.1, I take the structure to be that you start with the VC-2 stream and then use RTP headers to contain the information in Parse Info blocks. If that is correct, it would be much clearer if stated upfront. IMPORTANT S 4.2. > > The fields of the extended headers are defined as follows: > > Extended Sequence Number: 16 bits MUST Contain the high-order > 16-bits of the 32-bit packet sequence number, a number which > increments with each packet. This is needed since the high Increments by one? S 4.2. > data rates of VC2 Sequences mean that it is highly likely that > the 16-bit sequence number will roll-over too frequently to be > of use for stream synchronisation. > > B: 1 bit MUST be set to 1 if the packet contains the first byte of > an Auxiliary Data or Padded Data Unit. And otherwise must be 0? S 4.2. > > B: 1 bit MUST be set to 1 if the packet contains the first byte of > an Auxiliary Data or Padded Data Unit. > > E: 1 bit MUST be set to 1 if the packet contains the final byte of > an Auxiliary Data or Padded Data Unit. And otherwise must be 0? S 4.2. > from a new picture until all the coded data from the current > picture has been sent. > > If the receiver does not receive a transform parameters packet > for a picture then it MAY assume that the parameters are > unchanged since the last picture, or MAY discard the picture. How does this interact with packet loss? COMMENTS S 3. > the decoder. > > Each Sequence consists of a series of 13-octet Parse Info headers and > variable length Data Units. The Sequence begins and ends with a > Parse Info header and each Data Unit is preceded by a Parse Info > Header. Data Units come in a variety of types, the most important This text isn't very clear to me. Is the following valid: PI | Data | PI? How about PI | PI | Data | PI. PI | PI | PI | Data | PI? S 3. > should not be assumed. > > The High Quality (HQ) profile for VC-2 restricts the types of Parse > Info Headers which may appear in the Sequence to only: > > o Sequence Headers, The text above says that Sequence Headers are a type of Data Unit. So I'm confused by this text. S 4. > > 4. Payload format > > This specification only covers the transport of Sequence Headers, > High Quality Fragments, Auxiliary Data, and (optionally) End of > Sequence Headers and Padding Data. So it doesn't include Parse Info? S 4. > Picture (Figure 2), > > o A Picture Fragment containing VC-2 Coded Slices (Figure 3) for a > picture, > > o The end of a VC-2 Sequence (Figure 4) It would be helpful if you cited specific sections of VC-2 for these. S 4. > . . > +---------------------------------------------------------------+ > > Figure 6: RTP Payload Format For Padding Data > > All fields in the headers longer than a single bit are interprted as Nit: interpreted. S 4.2. > > Data Length: 32 bits For an auxiliary data unit this contains the > number of bytes of data contained in the uncoded payload > section of this packet. For a Padding Data Unit this field may > have any value and simply indicates the size of the recommended > padding. This seems like a very large field given that RTP datagrams are almost never this large, so I am suspecting the "uncoded payload" means pre- compressed? Can you be clearer..
Thanks for the work on this document. Given that the underlying format doesn't appear to be resilient to loss, I'm a little surprised to see no discussion of FEC; and, in particular, no treatment of the allocation of unequal error protection to the various packet types. For example, it sounds like the transform parameters packet is significantly more important than, e.g., a picture fragment that contains slices. I suspect there is a general prioritization among the various types that would useful to call out for implementors.