NETVC IETF 93

Contents

Agenda and Note Well

Requirements

Test and Evaluation Criteria

Tools and Techniques

Thor

Lapped Transforms

Screensharing Considerations

Hackathon Results

Appendix: Raw Minutes

Date: Monday, July 20th, 1300 - 1500; Wednesday, July 22nd, 1550 - 1700
Location: Prague, Czech Republic
Chairs: Adam Roach, Mo Zanaty
Minutes: Martin Thomson, Brian Rosen

Agenda and Note Well

Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-0.pdf

Chairs made more explicit than normal appeal to pay attention to Note Well statement.

Requirements

Presenter: Alexey Filippov
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-1.pdf
Draft: draft-filippov-netvc-requirements

Randell Jesup: 350ms is way too high a latency for interactive; even 250ms is bad

Eric Rescorla: error robustness is bit errors or packet loss?

Alexey Filippov: both

Eric Rescorla: encrypted transports prevent the former

Mo Zanaty: Jack Moffitt stated that bit error robustness was a non-goal for this reason; when Jack and Alexey discuss merging requirements, we might address that concern

Mo Zanaty: regarding delay: most codecs have a zero-delay form, we should operate codecs at zero-delay in testing configurations so that we don't design a codec that isn't good for real-time

Timothy Terriberry: clarifying that this is a single frame, but an entire frame is made available to the encoder

(confirmed)

Patrick: on the whole, the lower bounds on these requirements don't go low enough; numbers need to be lower on mobile devices and in adverse network requirements

Adam Roach: would prefer to see the testing/measurement stuff in the testing document

Adam Roach: what process would you recommend for getting subjective testing done, isn't it expensive?

Alexey Filippov: yes, we should collect MOS scores using a formal procedure

Thomas Davies: multiple bit depths and colour formats might be nice; but YUV 4:2:0 at 8 bits might be a better place to start. 4:2:2 might have IPR issues because you have to scale blocks

Alexey Filippov: experts elsewhere are recommending focus on 4:2:2 and 4:4:4.

Thomas Davies: making wide colour gamut mandatory might be risky

Alexey Filippov: 4:2:0 is more important, but for skincasting and for an internet codec in general, it's hard to support these features

Mo Zanaty: all aspects of the inputs (resolution, frame rates, gamut, ...) these are advisory, and we should avoid choosing tools that prevent support of these things; but we don't want to

Alexey Filippov: these are largely optional

Mo Zanaty: we might need to consider the simplest thing first and to maybe entertain the idea of profiles

Stephen Botzko: are interlace formats a mandatory thing? they should not be required in my opinion

Alexey Filippov: these are merely examples

Adam Roach: there is a conversation that is ongoing on this point

Timothy Terriberry: 4:2:2 is crazy; 4:4:4 is important; 4:2:0 @8bit is definitely a first step, but we do want to get to 4:4:4 and these other things eventually; we don't ultimately have to meet the requirements we document

Test and Evaluation Criteria

Presenter: Thomas Daede
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-2.pdf
Draft: draft-daede-netvc-testing

Mo Zanaty: re cbr and buffers; other standardization efforts remove rate control from the comparisons; codec design sometimes doesn't involve rate control; how do you avoid bias from the rate control?

Thomas Daede: it's impossible to configure two codecs with rate control off; some sort of rate control mode is needed to test the codec features that rely on rate control

Mo Zanaty: it seems like we're adding more complexity to the analysis, but that might be necessary

Harald Alvastrand: http://comparecodecs.appspot.com/ plug; comparing codecs without rate control is pointless; proponents of a particular codec simply need to implement good rate control

Thomas Davies: this adds complexity; this was a big problem with the webrtc codec shoot-out; and the results for x264 and vp8 were unsuitable for our purposes; you have to do some analysis; we should test with rate control and make it possible, but we shouldn't make this a condition of entry

Harald Alvestrand: agree on when we do these evaluations; this is not needed for small tweaks, but in determining whether we are better than something else, we need this

Stephen B: in codec development, it might be good to have rate control off; when you are done, you need rate control; proposing tools will be complicated if rate control potentially ... CBR might be a bad idea

Eric Rescorla: I hope that we don't need to have a shoot-out; and when we are done, we can exercise discretion in determining; ultimately, these tests just represent more software and more information, more tests aren't ultimately harmful

Randell Jesup: can we find the bits of rate control that matter?

Adam Roach: not advocating for CBR, but increased use of IP surveillance means that motion can appear as large changes in bitrate, which might be bad

Timothy Terriberry: rmcat might be a better place for the rate control

Jonathan Lennox: rmcat provides you the constraints

Thomas Daede: there will be something added for rate control-less modes for testing of specific tools

Mo Zanaty: for chroma, we should test it; we might need to break new ground for temporal effects

Thomas Davies: my favourite metric is whether I like it or not; we need to cover a lot of things: motion, text, streaming, etc...

Thomas Daede: subjective testing is important for validating the metrics

Mo Zanaty: bd-rate... we need both simple tools and complex ones

Harald Alvestrand: if one of the codecs doesn't cover the entirety of the range, it's hard to calculate bd-rate correctly

Eric Rescorla: do you need metrics that can run occasionally?

Thomas: more small clips are good; some longer clips for real-time

Patrick: numbers greater than 15s and less than hours for video conferencing; an hour would be great; aws bills by the hour, so use the whole hour; check mechanical turk for subjective

Timothy Terriberry: if you have more CPU time, use it on the small videos; metrics are generally only valid over relatively short sequences; content in a short sequence correlates well with itself, which means that you lose information

Thomas Davies: look for the worst frame in the test set

Patrick: testing cuts in video conferencing is important (scene transitions esp.) so that you can test for error propagation over transitions

Tools and Techniques

Presenters: Timothy Terriberry, Jean-Marc Valin
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-3.pdf
Draft: draft-terriberry-netvc-codingtools, draft-terriberry-netvc-obmc, draft-valin-netvc-pvq

Mo Zanaty: are we concerned about losing correlation between components? is this extensible so that you might do rgb coding and predict r and b from g?

Timothy Terriberry: not tested; might need better entropy coding; rgb tends to see consistent gain across the components

Nathan Egge: chroma from luma assumes linear correlation; rgb might need a different model

Chair: thanks for finishing on time

Thor

Presenters: Arild Fuldseth, Cullen Jennings
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-6.pdf
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-4.pdf
Draft: draft-fuldseth-netvc-thor

Timothy Terriberry: How did you choose filters?

Arild Fuldseth: Same as H.265, no work on better choices yet

Jean-Marc Valin: Low Pass Filter have a rate component

Arild Fuldseth: One bit, means really only distortion??? (Not sure I followed that discussion)

Harald Alvestrand: Your numbers don't match my comparisons between H.265 and VP9

Arild Fuldseth: we'll discuss

Moz: You got settings for VP9 from ??

Moz: Need better ways to do apples to apples comparisons

Randall Jesup: WebRTC interesting case is "head and shoulder" why is there so much difference?

Arild Fuldseth: Thor biased towards more static images

Shaoshin: can you have a combination of low delay and use of B frames?

Arild Fuldseth: Yes

Timothy Terriberry: Entropy coding - steal entropy coding from Daala, convert symbol by symbol to arithmetic coding

Steve: Great to have another candidate

Chairs: Output of WG is a single codec, so we need understand the tradeoffs and benefits of each candidate

Steve: I agree

Lapped Transforms

Presenter: Nathan Egge
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-7.pdf
Draft: draft-egge-netvc-tdlt

This is a coding tool in Daala

Moz: What is the motivation for exploring this?

Nathan Egge: Reduce blocking artifacts, hoped to get around ipr on deblocking filter

Moz: where do you not get good results

Nathan Egge: 16 x 16

Jean-Marc Valin: Coding gains somewhat misleading, but reduced blocking artifacts on larger blocks, not as simple as looking at coding gains

Jean-Marc Valin: Have a test image that helps. Need many metrics

Moz: Do we need a better metric for blocking artifacts

Screensharing Considerations

Presenter: Jean-Marc Valin
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-8.pdf
Draft: draft-valin-netvc-l1tw

Jonathan Lennox: Another difference is that screen sharing has different temporal properties than live video -- long stretches of little to no change, followed by windows moving or everything changing all at once

Moz: Could be wireless display, which could be anything

Jean-Marc Valin: May have to control by window, e.g. video in a window

Moz: Is this constrained to the exact same number of bits?

Jean-Marc Valin: Yes

Moz: Any jpeg extensions

Jean-Marc Valin: No

Shaoshin: need to show mixtures of components, x265 shows better

Jean-Marc Valin: Yeah, x265 does better on everything except text

Shaoshin: do you plan to find multiple ways to encode portions of the screen?

Jean-Marc Valin: Yes

Shaoshin: Metrics need to match images characteristics

Jean-Marc Valin: yeah, suggest better metrics

Moz: Why anti-aliasing, it should work better on sharp lines and non-antialiased text

Jean-Marc Valin: spreadsheets work poorly, horizontal and vertical lines do poorly, x265 does the best on these

Hackathon Results

Presenter: Timothy Terriberry
Slides: https://www.ietf.org/proceedings/93/slides/slides-93-netvc-5.pdf

See slides for information.

Appendix: Raw Minutes

# Note Well

# Alexey on requirements

Randell Jesup: 350ms is way to high a latency for interactive, even 250ms is bad

Eric Rescorla: error robustness is bit errors or packet loss?

Alexey: both

Eric Rescorla: encrypted transports prevent the former

Mo: jack moffitt stated that bit error robustness was a non-goal for this reason; when Jack and Alexey discuss merging requirements, we might address that concern

mo: regarding delay: most codecs have a zero-delay form, we should operate codecs at zero-delay in testing configurations so that we don't design a codec that isn't good for real-time

tim: clarifying that this is a single frame, but an entire frame is made available to the encoder

(confirmed)

patrick: on the whole, the lower bounds on these requirements don't go low enough; numbers need to be lower on mobile devices and in adverse network requirements

Adam Roach: would prefer to see the testing/measurement stuff in the testing document

Adam Roach: what process would you recommend for getting subjective testing done, isn't it expensive?

Alexey Filippov: yes, we should collect MOS scores using a formal procedure

thomas davies: multiple bit depths and colour formats might be nice; but YUV 4:2:0 at 8 bits might be a better place to start. 4:2:2 might have IPR issues because you have to scale blocks

Alexey Filippov: experts elsewhere are recommending focus on 4:2:2 and 4:4:4.

thomas: making wide colour gamut mandatory might be risky

Alexey Filippov: 4:2:0 is more important, but for skincasting and for an internet codec in general, it's hard to support these features

mo: all aspects of the inputs (resolution, frame rates, gamut, ...) these are advisory, and we should avoid choosing tools that prevent support of these things; but we don't want to

Alexey Filippov: these are largely optional

mo: we might need to consider the simplest thing first and to maybe entertain the idea of profiles

stephen b: are interlace formats a mandatory thing? they should not be required in my opinion

Alexey Filippov: these are merely examples

Adam Roach: there is a conversation that is ongoing on this point

tim: 4:2:2 is crazy; 4:4:4 is important; 4:2:0 @8bit is definitely a first step, but we do want to get to 4:4:4 and these other things eventually; we don't ultimately have to meet the requirements we document

# Thomas Daede on testing

mo: re cbr and buffers; other standardization efforts remove rate control from the comparisons; codec design sometimes doesn't involve rate control; how do you avoid bias from the rate control?

thomas: it's impossible to configure two codecs with rate control off; some sort of rate control mode is needed to test the codec features that rely on rate control

mo: it seems like we're adding more complexity to the analysis, but that might be necessary

harald: comparecodecs.appspot.com plug; comparing codecs without rate control is pointless; proponents of a particular codec simply need to implement good rate control

Thomas Davies: this adds complexity; this was a big problem with the webrtc codec shoot-out; and the results for x264 and vp8 were unsuitable for our purposes; you have to do some analysis; we should test with rate control and make it possible, but we shouldn't make this a condition of entry

harald: agree on when we do these evaluations; this is not needed for small tweaks, but in determining whether we are better than something else, we need this

Stephen B: in codec development, it might be good to have rate control off; when you are done, you need rate control; proposing tools will be complicated if rate control potentially

... CBR might be a bad idea

Eric Rescorla: I hope that we don't need to have a shoot-out; and when we are done, we can exercise discretion in determining; ultimately, these tests just represent more software and more information, more tests aren't ultimately harmful

Randell Jesup: can we find the bits of rate control that matter?

Adam Roach: not advocating for CBR, but increased use of IP surveillance means that motion can appear as large changes in bitrate, which might be bad

tim: rmcat might be a better place for the rate control

jonathan L: rmcat provides you the constraints

thomas daede: there will be something added for rate control-less modes for testing of specific tools

mo: for chroma, we should test it; we might need to break new ground for temporal effects

thomas davies: my favourite metric is whether I like it or not; we need to cover a lot of things: motion, text, streaming, etc...

thomas daede: subjective testing is important for validating the metrics

mo: bd-rate... we need both simple tools and complex ones

harald: if one of the codecs doesn't cover the entirety of the range, it's hard to calculate bd-rate correctly

Eric Rescorla: do you need metrics that can run occasionally?

thomas: more small clips are good; some longer clips for real-time

patrick: numbers greater than 15s and less than hours for video conferencing; an hour would be great; aws bills by the hour, so use the whole hour; check mechanical turk for subjective

tim: if you have more CPU time, use it on the small videos; metrics are generally only valid over relatively short sequences; content in a short sequence correlates well with itself, which means that you lose information

thomas davies: look for the worst frame in the test set

patrick: testing cuts in video conferencing is important (scene transitions esp.) so that you can test for error propagation over transitions

# Tim/Jean-Marc on Daala

mo: are we concerned about losing correlation between components? is this extensible so that you might do rgb coding and predict r and b from g?

tim: not tested; might need better entropy coding; rgb tends to see consistent gain across the components

nathan egge: chroma from luma assumes linear correlation; rgb might need a different model

Adam Roach: thanks for finishing on time,

IETF93 netvc meeting, Thursday July 22, raw Minutes by Brian Rosen

Arild Fuldseth, with Cullen Jennings on IPR Thor draft-fuldseth-netvc-thor

Tim: How did you choose filters?

Arild: Same as H.265, no work on better choices yet

Jean-Marc: Low Pass Filter have a rate component

Arild: One bit, means really only distortion??? (Not sure I followed that discussion)

Harald: Your numbers don't match my comparisons between H.265 and VP9

Arild: we'll discuss

Moz: You got settings for VP9 from ??

Moz: Need better ways to do apples to apples comparisons

Randall: WebRTC interesting case is "head and shoulder" why is there so much difference?

Arild: Thor biased towards more static images

shaoshin": can you have a combination of low delay and use of B frames?

Arild: Yes

Tim: Entropy coding - steal entropy coding from Daala, convert symbol by symbol to arithmetic coding

Steve: Great to have another candidate

Chairs: Output of WG is a single codec, so we need understand the tradeoffs and benefits of each candidate

Steve: I agree

Nathan Egge Lapped Transforms draft-egge-netvc-tdlt

This is a coding tool in Daala

Moz: What is the motivation for exploring this?

Nathan: Reduce blocking artifacts, hoped to get around ipr on deblocking filter

Moz: where do you not get good results

Nathan: 16 x 16

Jean-Marc: Coding gains somewhat misleading, but reduced blocking artifacts on larger blocks, not as simple as looking at coding gains

Jean-Marc: Have a test image that helps. Need many metrics

Moz: Do we need a better metric for blocking artifacts

Jean-Marc Valin Screensharing Consideration draft-valin-netvc-l1tw

<missed Jonathan's comment on temporal behavior>

Moz: Could be wireless display, which could be anything

Jean-Marc: May have to control by window, e.g. video in a window

Moz: Is this constrained to the exact same number of bits?

Jean-Marc: Yes

Moz: Any jpeg extensions

Jean-Marc: No

shaoshin": need to show mixtures of components, x265 shows better

Jean-Marc: Yeah, x265 does better on everything except text

shaoshin": do you plan to find multiple ways to encode portions of the screen?

Jean-Marc: Yes

shaoshin": Metrics need to match images characteristics

Jean-Marc: yeah, suggest better metrics

Moz: Why anti-aliasing, it should work better on sharp lines and non-antialiased text

Jean-Marc: spreadsheets work poorly, horizontal and vertical lines do poorly, x265 does the best on these

Tim Terriberry Hackathon Results