Skip to main content

Minutes IETF119: mlcodec: Wed 05:00
minutes-119-mlcodec-202403200500-00

Meeting Minutes Machine Learning for Audio Coding (mlcodec) WG
Date and time 2024-03-20 05:00
Title Minutes IETF119: mlcodec: Wed 05:00
State Active
Other versions markdown
Last updated 2024-03-25

minutes-119-mlcodec-202403200500-00

Machine Learning for Audio Coding (mlcodec) Working Group

MLCODEC @ IETF 119 in Brisbane, Australia

15:00-16:30 AEST Wednesday, March 20, 2024 in room M1
22:00-23:30 PDT Tuesday, March 19, 2024

Chairs: Greg Maxwell, Mo Zanaty
Area Director: Murray Kucherawy
Note Taker: Victor Pascual

Join Meeting: https://meetings.conf.meetecho.com/ietf119/?session=32088

Onsite Tool :
https://meetings.conf.meetecho.com/onsite119/?session=32088
Take Notes : https://notes.ietf.org/notes-ietf-119-mlcodec
Add Calendar: https://datatracker.ietf.org/meeting/119/session/32088.ics

Help Guide :
https://www.ietf.org/how/meetings/technology/meetecho-guide-participant/

Agenda

  1. Administrivia - Chairs, 5 min
    Note well, agenda bash, draft status

  2. Opus extension mechanism - Timothy Terriberry, 15 min
    draft-ietf-mlcodec-opus-extension

Conversation so far has focused on developing an extension numbering
scheme for the Opus audio codec draft. The main topics discussed
include:

  • Proposing a system that splits extensions into short and long
    categories, with different numbering approaches for each.

  • Debating the priorities and use cases of potential future extensions
    to help inform the design, like enhancements that may send side
    information.

  • Considering proposals to more efficiently allocate the extension
    identifier space to reduce overhead for things like variable length
    extensions.

  • Discussing current Opus packet sizes and how short extensions could
    benefit certain types of data payloads.

  • The need for more review of the draft and further development of
    enhancement proposals before finalizing the numbering scheme.

  1. Deep REDundancy - Jean-Marc Valin, 25 min
    draft-valin-opus-dred
  • Changes made to Dred since the last meeting, including adding an
    optional extended offset byte to allow encoding backwards in time
    and more efficiently trimming silence.

  • Questions around how the dread extensions impact audio level
    calculations and speaker switching decisions by SFU systems.

  • Clarification provided that the encoder will not encode dread for
    periods of silence.

  • Discussion of publishing the normalization aspects of Dred, such as
    defining acoustic features for speech recognition in a standardized
    way.

  • Details provided on the implementation status of Dred and available
    WebRTC patches for testing.

  • Results reported from WebRTC experiments showing quality
    improvements with extended offsets handling packet loss situations.

  • Open parameters like maximum dread duration and lowest/highest
    supported bitrates.

  • Open questions raised on potential truncation support and solutions
    for publishing binary data.

  • Comments on the timing for an adoption call for the draft and any
    remaining changes needed.

    Adoption vote:10 yes, 2 no opinions, 20 in the room.

  1. Speech coding enhancements - Jan Buethe, 25 min
    draft-buethe-opus-speech-coding-enhancement
  • Clarification was provided that the hybrid mode in Dred is
    compatible across channels and improves quality in stereo and group
    stereo use cases.

  • A question was asked about how the hybrid mode physically separates
    the processed and unprocessed components.

  • Next steps discussed for algorithm development include supporting
    higher bandwidths and more frames of redundancy.

  • Standardization next steps proposed taking the full implementation
    into account and defining quality requirements.

  • Signaling of enhancement techniques was discussed, as well as
    potential naming schemes to identify versions.

  • A point was made that the proposed flag to disable enhancements for
    non-speech content is not technique-specific.

  • Clarification was provided that the single-speaker training of
    current speech models has also been tested on real recordings.

  • The conversation wrapped up by confirming there were no other topics
    or questions to discuss.