Skip to main content

Minutes IETF117: mlcodec: Tue 17:00
minutes-117-mlcodec-202307260000-00

Meeting Minutes Machine Learning for Audio Coding (mlcodec) WG
Date and time 2023-07-26 00:00
Title Minutes IETF117: mlcodec: Tue 17:00
State Active
Other versions markdown
Last updated 2023-08-05

minutes-117-mlcodec-202307260000-00

ML CODEC WG Meeting Minutes

IETF 117 - San Francisco
Tuesday, July 25, 2023 17:00-18:00 PDT
Chairs: Greg Maxwell, Tim Terriberry
Responsible Area Director: Murray Kucherawy
Note Taker: Mo Zanaty
Chat Scribe: Jonathan Lennox

Agenda

5 min Administrivia
25 min Opus extension mechanism (draft-valin-opus-extension)
15 min Speech coding enhancements
(draft-buethe-opus-speech-coding-enhancement)
15 min Deep REDundancy (draft-valin-opus-dred)

Opus Extension Mechanism - Jean-Marc Valin

Mo Zanaty: Can we exhaust the 7-bit extension ID space?
Jean-Marc Valin: We can reserve the last extension ID to signal more if
needed. Only 3 planned right now.
Harald Alvestrand: Reserve extension ID 127 for future expansion.
Mo Zanaty: SDP extensions are purely declarative with no offer/answer
semantics?
Jean-Marc Valin: Yes, declarative.
Mo Zanaty: This draft can specify purely declarative, but actual
extensions may need to specify rules such as incompatibilty with other
extensions.
Jonathan Lennox: SFU may not be able to blindly forward all extensions,
but may need to aggregate / merge.
Nils Ohlmeier: Should we have offer/answer semantics to negotiate
extensions?
Jean-Marc Valin: Most implementations would take the intersection of
sender and receiver support, unless it had reasons to do otherwise.
Jonathan Lennox: Extensions should specify whether they can be repeated
or not.
Francois Nguyen: Can the order of extensions impact the meaning of later
extensions.
Jean-Marc Valin: Yes, the separator extension changes the state of
subsequent extensions.
Chairs: Poll for who read: 10. Poll for useful starting point: 12 yes, 1
no.
ACTION: Adopt draft after confirming on the list.

Speech Coding Enhancements - Jan Buethe

Mo Zanaty: This precludes enhancing non-voice signals?
Jan Buethe: Yes, this only supports speech.
Francois Nguyen: Decoding is more complex than encoding? Is that ok?
Jan Buethe: Yes.
Harald Alvestrand: Why should standardization care about a decoder-side
only enhancement? The wire signal input is not changed.
Jean-Marc Valin: Test vectors specify input to output signals, which
needs to be expanded in the context of such enhancements. We may also
add side information for further enhancement which will impact the wire
signal input.
Mo Zanaty: How to standardize this is the hardest part of this effort.
If we only specify quality requirements, does that mean closed models
can be used as long as they meet the quality requirements?
Jean-Marc Valin: When we add side info, we must standardize the wire
format and encoder design to send it. But we also expect to share open
models for decoder-side models with no side info. However, we don't want
to standardize models directly since they can change quickly with future
advancements.
Francois Nguyen: In evaluation results, Baseline is very close to LACE,
so is it really enhancing?
Jan Buethe: Enhancement is statistically significant.

Deep REDundancy - Jean-Marc Valin

Jonathan Lennox: How much does this increase Opus library binary size?
Jean-Marc Valin: Currently 17MB, goal to get down to 4MB.
Cullen Jennings: 17MB does not seem like an issue. Smaller is always
better, but it is already acceptable.
Jonathan Lennox: ML models can suffer from adversarial input. Do we need
some security considerations to discuss that?
Jean-Marc Valin: We hope to avoid that with some input-output comparison
and validation.
Cullen Jennings: I don't see the attack vector.
Jean-Marc Valin: Far fetched, but can packet loss turn a "No" to "Yes"
with adverserial input? Maybe, but we're hoping to somehow validate the
output is close enough to input, so rogue enhancements are not allowed.