Skip to main content

Minutes for AVTEXT at IETF-93
minutes-93-avtext-1

Meeting Minutes Audio/Video Transport Extensions (avtext) WG
Date and time 2015-07-23 15:40
Title Minutes for AVTEXT at IETF-93
State Active
Other versions plain text
Last updated 2015-08-13

minutes-93-avtext-1
AVTEXT Audio/Video Transport Extensions

Thursday, 24 July, 2015 17:40 - 19:10 CEST (Room: Berlin/Brussels)

Chairs: Jonathan Lennox and Magnus Westerlund (filling in for Keith Drage)
Responsible Area Director: Ben Campbell
Notetakers: Emil Ivov, Varun Singh

Agenda bash and status update
=====

WG status:
    grouping taxonomy approved
    stream pause requires a few changes after AD review
    splicing notification: ready for pub req, will proceed once pause is
    advanced

Action Item: Ben to confirm that Bo responded to all his AD review
items for stream pause, send to IETF last call

==========================================
RTP Header Extension for source description
Magnus Westerlund, presenting
==========================================

Status: Ready for WGLC

Action Items:
Simon Perrault will review draft
Magnus and/or Roni will submit 5285bis to AVTCORE

Discussion:

Issue: switching between one and two byte header extension format is not clear
in RFC 5285 Mo Zanaty: the issue is even worse because two-byte header support
is not not mandatory Magnus: we suggest we clarify 5285 Roni Even: +1 Colin
Perkins: +1 Stephan Wenger: Deprecate the one-byte headers? Is there much
deployment of it? Many people: yes. Jonathan: Most current deployment is
one-byte, not clear there is deployment of two-byte Mo: Suggest to mandate
support for both extensions and then require that two byte header extensions
use IDs bigger than 15 so that they can be negotiated. Magnus or Roni will
submit 5285bis to AVTCORE

Jonathan: Who read this?
5/6 hands
Jonathan: Who will review?
Simon Perreault

Jonathan: many things would likely want to use this.
Roni: there’s also a dependency in clue
Colin: We should check if something in bundle would also need to be fixed to
address the issues Magnus pointed out

Jonathan: OK, we are ready for last call!

==========================================
Layer Refresh Request (LRR) RTCP feedback message
Jonathan Lennox
==========================================

Action Items:
Bernard Aboba and Mo Zanaty to review H.264-SVC layer refresh text.
Stephan to work with JCT if a new SEI message is needed with H.265 for temporal
layer refresh. Jonathan will work with co-authors for VP8 text.

Jonathan to send text to the list about what FIR means for MRST/MRMT.

Discussion:

Jonathan: this is now a WG document
Deltas: not many. main one: recognizing layer refresh point
H.264/SVC is complete
ACTION: Bernard A. and Mo Z. are volunteering for reviews of this part
H.265
Stephen Wenger: is anyone interested at all in non-scalable nested coding
structures? Jonathan: yes for VP8 and VP9. Stephen: given current paces of
SDOs, that’s likely going to exist in H.265 as well so I volunteer doing that.
ACTION ITEM: Stephan will take of this for JCT ACTION ITEM: Jonathan will nudge
colleagues to do reviews for VP8 ACTION ITEM: review the description of how you
recognize layer refresh?

Mo: not clear the VP8 Y bit is sufficient to recognize all temporal sync points.
Jonathan: false negatives aren't the end of the world, false positives are bad.

Next slide: Temporal switch points are complicated!
Justin gets up to explain … says he’s confused
Mo explains: these are just three layers of reference frames

Open issue: what does FIR mean for spatial Multi stream scalability
QUESTION to WG:
Refresh layer or refresh the whole source? Only relevant for H.264-SVC.
Question: do we want to define this? Do we want to do this here?

Stephan Wenger: wasn’t there some kind of consensus somewhere (payload) that
new payload formats should have some form of FIR mapping. Jonathan: shakes head
Stephan: an issue for SHVC payload format. Stephan: I will talk to Ye-Kui.
Jonathan: Two issues: what should the semantic be, and how is it instantiated
for any given codec. My recommendation: FIR means refresh all layers and we
need to define something else for per layer. Mo: I sort of thought that’s what
it meant already. After all it’s a FULL intra refresh Bernard: For IMTC
structures it doesn't make a difference.

Jonathan: If no one is currently doing anything where it makes a difference,
not an issue.

CONSENSUS: FIR is a full refresh.

Jonathan Question: where do we say that?
Mo: let’s just have this doc update 6190
Jonathan: probably update CCM, not 6190.

Some more discussions showing there’s actually no consensus.

Justin: Does this mean that FIR has different behavior for MRST than for
simulcast? Jonathan: Yes. Justin: I think that's confusing and awkward. Mo:
Maybe have FIR refresh all simulcast streams? [Unhappiness] Stephan: Multiple
prediction coding coming. Jonathan: My inclination is to say that MRST and
simulcast are different, despite the assymetry. Stephan: Possibly recommend
that FIR be sent for all layers simultaneously. Justin: Can we just say that
LIR is per-layer and FIR is per-SSRC? Jonathan: For MRST/MRMT those are the
same thing. Stephan's suggestion would be straightforward for MRST, but for
MRMT the multiple FIRs will not be the same packet so there might be sync
issues.

Magnus: write it up and send this to the list (ACTION ITEM)
Stephan agreed
Jonathan agree: will do!

==========================================
RTCP feedback message for image control
Roni Even
==========================================

Action items:
Roni: Consider timestamp for which picture request is relative to
Roni will draft a liaison statement for 3GPP

Discussion:

Motivations: a picture of Matt Damon
also: get a detailed image for part of an image. camera zoom and move. zoom on
participant in an MCU

Proposal: have an RTCP message the describes the area they want zoomed or
moved. Requires consent of sender. Reference is based on current
picture/quality rate

There’s also a notification/ack message from the sender

Mo: question. you shouldn’t do this with absolutes but with relatives because
resolutions change. Roni: idea is to have pixels based on the current view. Mo:
but there’s no way to know which frame this relates to Randell J (through
Jabber): same question pointing out the race conditions

Ben: coordinates are unsigned?
Roni: can be negative so that you can move outside of your view

Stephen: using reference timestamps should fix most of the race conditions
Roni: Agreed, we will consider this (ACTION ITEM)

Randell: offsets could be made floats (0-1 of the source width) and be relative
to the absolute source

Peter T. question: Is this a feedback or a control message?
Roni: it’s both
Peter: is this weird?
Roni: no
Peter: how do you do reliability?
Roni: we have an ICN response message. also
Colin: also there’s a standard convention for doing this
Roni: yes, that’s what we are using here

Peter T: how are you referring to the stream that you mean?
Roni: by SSRC
Peter: what if the SSRC changes?
Roni: then you're receiving a new stream so you have a new view
Roni: Relatedly, in multipoint BFCP allows you to say who controls the camera

Stephan Wenger: I think the document is underspecified. You need to
say whether your source window is with respect to decoded bits or only
with respect to bits that are meant for human consumption

Ben C.: clarification on the IANA issue - they registered parameters with IANA,
but they're held up because we don't have expert reviewers. Roni: yes, I was
just saying this is why I didn't know they had done it. Ben C.: We also need to
think about whether we want to continue work that's divergent from what they've
done, whether there's harm done by having two ways of doing things.

Magnus: we need to feedback our criticism to the original authors before we
start this work

Ben: Is this in a release?
Magnus: Yes, but they have a process for fixing it.

Mo: suggestion about semantics on the ICN message: would be useful if sender
gave current viewport Roni: makes sense

Randell (via Jabber): we already do dynamic resolution scale in Firefox, so
using non-pixel-based values is a win. Also means you don't have to have memory
of sizes, and avoids confusing when using layered encodings.

Jonathan: I think we are better off helping 3GPP fix their own message than
doing our own competitive message, if possible. Cullen: +1 Jonathan: Rachel and
Roni, go to a 3GPP meeting and help them fix this. Also, please draft text of a
liaison statement.

Roni: can we ask if people care about this?
Magnus: ok, WG QUESTION: WHO CARES ABOUT THIS AT ALL?
Cullen: maybe some of our video surveillance people would.
Bo: maybe I care too. But I think we should try to work it out in 3GPP. 
(PROMISE) Jonathan: you could be the person who catches the liaison statement
in 3GPP? Bo: Myself or a close colleague.

Stephan Wenger: Let’s stop using poor design of H.281 as an excuse. No one has
given valid reasons for replacing it. Jonathan: The 3GPP spec recomments both
H.281 and an RTCP feedback message, presumably for different circumstances.
Stephen Botzko: we tried to replace this with H.282 but no one implemented it.
So I agree with Stephan.

ACTION ITEM AND CONSENSUS: Roni will write a liaison to 3GPP and we’ll see what
happens there

==========================================
Video Frame Info (RTP Header Extension)
Mo Zanaty
==========================================

Status: Consensus to adopt as WG document.
Action Item: Chairs to work with ADs to add milestone.

Discussion:

Motivation: we need payload agnostic RTP switching (like in SFUs). The reason
is that payload is often encrypted and even if not it could be an unknown
format.

Peter Thatcher: original idea was to be codec-agnostic, but there is
codec-specific information? Mo: If you need spatial/quality information, yes. 
H.265's combined layerId forces this. Peter: Would it be possible to have some
way to specify the semantic explicitly? Mo: I'm not sure how that's easier than
doing it per-payload-type. Perhaps we could write recommendations for how
future codecs do things.

Stephan Wenger: Some stuff is still missing in this document.
Design suggestion: let’s just say, you grab the first N bits of your codec
packet and copy it here. This is not future proof and we can make it future
proof

Jonathan: you would certainly need TL0 pic index.  For spatial scalability
would also need something equivalent to the VP9 scalability structure /
scalability update which tells you what layers you can be expecting to receive.
Also suggestion:  you could restrict this to temporal only as the behavior is
much better defined and understood. If we do keep these layer IDs, we could
arrange things so they are the same as in the LRR spec.

Bernard Aboba: +1 on the TL0 pic index.
Mo: OK, I’ll take this up on the list

Jonathan: how many people know what this is about? MANY
          how many people think we should work on it? MANY
          anyone thinking we shouldn’t?

Stephan: We need to understand Selective Forwarding Unit architecture better
before we work on it. Bernard: We need this today, have multiple proprietary
implementations of it, and we can learn more working on it. Mo: bernard’s
document outlines a problem, this document can be viewed as a solution to that
problem Stephan: we shouldn’t work on this until we study it well and know
exactly what we want to do. Cullen: I am on the other end. let’s just get it
done to address today’s problems. Bernard: The proprietary arches only solve
the temporal case. This draft addresses today’s problems and even goes a bit
beyond. So that’s good enough.

Magnus: Who supports this as a WG doc? 10 PLUS HANDS
Magnus: Against? none

Magnus: will work with ADs to add milestone