RTP Payload Format for ISO/IEC 21122 (JPEG XS)
draft-ietf-payload-rtp-jpegxs-18
Yes
(Murray Kucherawy)
No Objection
Erik Kline
Éric Vyncke
(Alvaro Retana)
(Robert Wilton)
Note: This ballot was opened for revision 15 and is now closed.
Erik Kline
No Objection
Roman Danyliw
No Objection
Comment
(2021-06-10 for -16)
Sent for earlier
Thank you for addressing my COMMENTs.
Éric Vyncke
No Objection
Murray Kucherawy Former IESG member
Yes
Yes
(for -15)
Unknown
Alvaro Retana Former IESG member
No Objection
No Objection
(for -16)
Not sent
Benjamin Kaduk Former IESG member
No Objection
No Objection
(2021-06-17 for -16)
Sent
I'll echo the sentiment of other reviewers that the scope of review possible is limited witout access to the underlying ISO specification. I further note that in the recent case of https://datatracker.ietf.org/doc/draft-ietf-payload-vp9/ (for which the underlying specification is freely available), there was an error in replicating the chroma subsampling details from the underlying reference to the internet-draft. Any such errors are undetectable for this draft. Section 4.3 Does the value of the T and K bits need to be identical for all packets of a given RTP stream? Section 4.4 It's perhaps needlessly confusing to have the human-readable slice labels in Figures 8 and 9 start at 1 but the SEP counter start at 0. nit: if SLH is an acronym it should be expanded somewhere (it only appears in the figures, at present). In the slice packetization modes, do we have reasonable guarantees that the JPEG XS header (including all markers and marker segments) will fit into a single RTP packet? Section 7.1 Applications that use this media type: For example: SMPTE ST 2110, Video over IP, Video conferencing, Broadcast applications. I think bland declarative statements like "applications that transmit video over RTP" tend to be more common than longer "for example" listings, in this type of registration. Section 8 nit: s/SPD/SDP in the section heading.
Francesca Palombini Former IESG member
No Objection
No Objection
(2021-06-16 for -16)
Sent
Thank you for the work on this document. I have some non-blocking comments and observations. Francesca 1. ----- A JPEG XS codestream header, starting with an SOC marker, followed by one or more slices, and terminated by an EOC marker form a JPEG XS codestream. FP: I understand from the terminology what this is meant to specify, however how this is expressed makes it slightly confusing: it is not clear that the subject of "followed" is "A JPEG XS codestream header" and not "an SOC marker". 2. ----- FP: I agree with John that without access to ISO21122-{1,2,3}, it's not possible to do a complete review; in particular the media type registration contains parameters that are inherited by the ISO standards, with normative text that I cannot review. Like John, I trust the responsible AD on that the doc has had sufficient reviews in the WG, from people with access to the ISO specifications. 3. ----- FP: I couldn't find that the Media type registration has been posted to the media-type mailing list, was that done? This was also highlighted in the shepherd write up, which I found helpful, so thank you Bernard.
John Scudder Former IESG member
No Objection
No Objection
(2021-06-16 for -16)
Sent
Thanks, I found this spec very readable -- modulo the fact that I have no expertise in the subject area! Below are some questions and comments I hope may be useful. I'm concerned that since the underlying ISO21122-{1,2,3} normative references are not readily available, it's not possible to do a complete review. I take it on faith that the document has received review within the WG by subject matter experts who are conversant with, and have access to, the relevant ISO specifications. 1. Section 4.1 In the case of an interlaced frame, the JPEG XS header segment of the second field SHALL be in its own packetization unit. I’m confused why the second field even needs its own header segment, considering you earlier told us (§3.4) that Both picture segments SHALL contain identical boxes (i.e. concatenation of the video support box and the colour specification box is byte exact the same for both picture segments of the frame). Surely this means the VS and CS boxes could have been elided from the second field? (Probably they’re left in for uniformity, but I thought it worth asking.) 2. Section 4.1 Due to the constant bit-rate of JPEG XS, the codestream packetization mode guarantees that a JPEG XS RTP stream will produce a constant number of bytes per frame, and a constant number of RTP packets per frame. To reach the same guarantee with the slice packetization mode, an additional mechanism is required. This can involve a constraint at the rate allocation stage in the JPEG XS encoder to impose a constant bit-rate at the slice level, the usage of padding data, or the insertion of empty RTP packets (i.e. a RTP packet whose payload data is empty). The “… additional mechanism is required” text is ambiguous. Does this mean to say that an implementation MUST use an (implementation-specific!) method, that makes its output CBR? That’s insinuated by the use of the word “required”. Or, does it mean that if an implementation wishes to render a CBR stream instead of a VBR one, it will need to adopt one of these strategies? Assuming your intent is the latter, I think the text should be clarified, for example OLD To reach the same guarantee with the slice packetization mode, an additional mechanism is required. NEW If an implementation wishes to provide the same guarantee with the slice packetization mode, it will need to use an additional mechanism. 3. Section 4.3 In the case that the Transmission mode (T) is set to 0, the slice packetization mode SHALL be used and K SHALL be set to 1. Presumably the reason for this is evident to someone conversant with JPEG XS? 4. Section 7.1 level: The JPEG XS level [ISO21122-2] in use. Any white space in the level name SHALL be omitted. Examples of valid levels names are '2k-1' or '4k-2'. Nit: s/levels/level/ (alternately, delete “names”). width: Determines the number of pixels per line. This is an integer between 1 and 32767. height: Determines the number of lines per frame. This is an integer between 1 and 32767. It would be less ambiguous to say “between 1 and 32767 inclusive”.
Lars Eggert Former IESG member
No Objection
No Objection
(2021-06-11 for -16)
Sent
All comments below are about very minor potential issues that you may choose to address in some way - or ignore - as you see fit. Some were flagged by automated tools (via https://github.com/larseggert/ietf-reviewtool), so there will likely be some false positives. There is no need to let me know what you did with these suggestions. Section 2. , paragraph 8, nit: > nit is the first (resp. last) byte of a RTP packet payload (excluding its pay > ^ Use "an" instead of "a" if the following word starts with a vowel sound, e.g. "an article", "an hour". (Also elsewhere in the document.) Section 2. , paragraph 17, nit: > ferent slices can be decoded independently from each other. Note, however, t > ^^^^^^^^^^^^^^^^^^ The usual collocation for "independently" is "of", not "from". Did you mean "independently of"?
Martin Duke Former IESG member
No Objection
No Objection
(2021-06-16 for -16)
Sent
In the abstract and intro, it promises "end-to-end latency confined to a fraction of a frame". I am not sure what to make of this guarantee. Latency is a measure of time and a frame is measured in ... bytes? Moreover, end-to-end latency is mostly a property of the path, and not something an encoding format can promise.
Robert Wilton Former IESG member
No Objection
No Objection
(for -16)
Not sent
Zaheduzzaman Sarker Former IESG member
(was Discuss)
No Objection
No Objection
(2021-07-20 for -17)
Sent
Thanks to the authors and Stephan Wenger for prompt action to make the ISO specification available to us. I have removed the discuss as the main reason for the discuss was resolved. I however have one major issue which I think need to be addressed. * Section 4.1 : the assertion here is that the jpeg xs produces constant bitrate. However, now I know that this codec can operate on both constant and variable bitrate mode. This section should clarify that when VBR mode is used the RTP payload format still holds or not. Also it might be helpful to discuss the two mode of operations somewhere in the introduction and state if the focus is only on constant bitrate mode with reasoning. The will level out the scope of the payload definition and also the impact on section 6. And more comments: * I can agree with Martin Duke's comment that the polymorphic use of "end-to-end latency" need to be explained a bit. * Section 3: having the statement that we are describing some terminologies or naming for this specification like it section 4 does, would help the reader to understand the context a bit more. * Section 3.3: I would suggest to add reference to Ppih and Plev at the first use of them. * Section 4.3: says -- "If codestream packetization mode is used, L bit and M bit are equivalent." does this mean it is enough to set the M bit only in the codestream packetization mode? * Section 4.3: says -- "In the case of codestream packetization mode (K=0), this counter resets whenever the Packet counter resets (see hereunder)" hereunder? can we give more specific reference instead? * Section 6: Usually when RTP is used congestion control and corresponding required rate control is done by the RTP applications. The use of RTP AVPF profile is the recommended profile to be used for real-time communication when efficient rate control (nope not the video encoder rate control :-)) is needed. Hence, I think we should recommend that use of AVPF profile here and also refer to RFC8888. The inclusion of circuit breaker makes lot of sense here. I also got to know that jpeg xs is designed to be used in a controller network environment. Hence, there should be a warning about use of this in a best effort Internet prior to the requirement on packetloss observation. If there is any acceptable parameter defined somewhere for packet loss then that also should be referenced here.