Machine Learning for Audio Coding (mlcodec) Working Group
MLCODEC @ IETF 119 in Brisbane, Australia
15:00-16:30 AEST Wednesday, March 20, 2024 in room M1
22:00-23:30 PDT Tuesday, March 19, 2024
Chairs: Greg Maxwell, Mo Zanaty
Area Director: Murray Kucherawy
Note Taker: Victor Pascual
Join Meeting: https://meetings.conf.meetecho.com/ietf119/?session=32088
Onsite Tool :
https://meetings.conf.meetecho.com/onsite119/?session=32088
Take Notes : https://notes.ietf.org/notes-ietf-119-mlcodec
Add Calendar: https://datatracker.ietf.org/meeting/119/session/32088.ics
Help Guide :
https://www.ietf.org/how/meetings/technology/meetecho-guide-participant/
Agenda
Administrivia - Chairs, 5 min
Note well, agenda bash, draft status
Opus extension mechanism - Timothy Terriberry, 15 min
draft-ietf-mlcodec-opus-extension
Conversation so far has focused on developing an extension numbering
scheme for the Opus audio codec draft. The main topics discussed
include:
Proposing a system that splits extensions into short and long
categories, with different numbering approaches for each.
Debating the priorities and use cases of potential future extensions
to help inform the design, like enhancements that may send side
information.
Considering proposals to more efficiently allocate the extension
identifier space to reduce overhead for things like variable length
extensions.
Discussing current Opus packet sizes and how short extensions could
benefit certain types of data payloads.
The need for more review of the draft and further development of
enhancement proposals before finalizing the numbering scheme.
Changes made to Dred since the last meeting, including adding an
optional extended offset byte to allow encoding backwards in time
and more efficiently trimming silence.
Questions around how the dread extensions impact audio level
calculations and speaker switching decisions by SFU systems.
Clarification provided that the encoder will not encode dread for
periods of silence.
Discussion of publishing the normalization aspects of Dred, such as
defining acoustic features for speech recognition in a standardized
way.
Details provided on the implementation status of Dred and available
WebRTC patches for testing.
Results reported from WebRTC experiments showing quality
improvements with extended offsets handling packet loss situations.
Open parameters like maximum dread duration and lowest/highest
supported bitrates.
Open questions raised on potential truncation support and solutions
for publishing binary data.
Comments on the timing for an adoption call for the draft and any
remaining changes needed.
Adoption vote:10 yes, 2 no opinions, 20 in the room.
Clarification was provided that the hybrid mode in Dred is
compatible across channels and improves quality in stereo and group
stereo use cases.
A question was asked about how the hybrid mode physically separates
the processed and unprocessed components.
Next steps discussed for algorithm development include supporting
higher bandwidths and more frames of redundancy.
Standardization next steps proposed taking the full implementation
into account and defining quality requirements.
Signaling of enhancement techniques was discussed, as well as
potential naming schemes to identify versions.
A point was made that the proposed flag to disable enhancements for
non-speech content is not technique-specific.
Clarification was provided that the single-speaker training of
current speech models has also been tested on real recordings.
The conversation wrapped up by confirming there were no other topics
or questions to discuss.