# Machine Learning for Audio Coding (mlcodec) Working Group {#machine-learning-for-audio-coding-mlcodec-working-group} IETF 118 Prague, CZ Tuesday, November 7, 2023 17:00 - 18:00 Prague time 08:00 - 09:00 Pacific Time Room: Berlin 3/4 Meeting link: https://meetecho.ietf.org/client/?session=31749 Notes: https://notes.ietf.org/notes-ietf-118-mlcodec Chairs: Greg Maxwell, Mo Zanaty Notetakers: Emily Heron Chat Scribes: Jonathan Lennox ## Agenda {#agenda} Administrivia (Chairs) 5 min Note Well Agenda Bash No agenda bashing Deep REDundancy draft-valin-opus-dred (Jean-Marc Valin) 15 min Status update Start at 5:02 Recap from last session Goal: Make Opus robust to long bursts of packet loss Proposal: code large amounts of redundant audio Proposed Format: Use extension code 32 - include more than just one byte Offset: position of redundancy in packet Decode until fewer than 8 bits remain Mo: Ind comment: any coupling between Opus frame sizes and these redundancy or are they independent? A: Completely independent. Encode in chunks of of 40 milliseconds redundancy M: So no coupling? A: Not simple, but no. Normative Aspects: Have a normative specification for the part that converts bits into features. Implementation Update: Improved quality from vocoder Complexity reduced from 10% to 3% CPU for high loss Weights down from 17 MB to 4 MB Tim Terriberry: 4 megabytes: what fraction is is that is the bits to feature decoder? A: About one megabyte. Needs to be smaller. Open Questions 1. SHould there be a maximum duration allowed? 1. technically we could do up to approc 10 min 2. proposal: no hard limit 2. What are the lowest and highest useful bitrates? 1. currently support 10 to 100 kb/s for 1 second redun Jonathan Lennox: Gen question: concerned what happens if I splice two streams? Is there a way to splice redundancy history? A: Interesting use case, had not thought of this before JM will consider how splicing will work. Can we actually merge the redundancy and get redundancy for talkers A & B? This is what he needs to think through. Jonathan Lennox: You said that the frame sizes don't matter. If you are encoding at frame size smaller then there is a small gap in time between the redundancy block and the current block, which is indicated by the offset time, and PLC should conceal it. Mark Harris 17:22 The Security Considerations should highlight the danger of potentially including earlier audio that was intended to be cut out, perhaps confidential information, that can be decoded from DRED. A: If it was in the redundancay it was already included in the Opus packet. Tim: For quantizer slope, would it be useful to specify a floor higher than the minumum? A: Open to that, it would be more bits. Would welcome feedback. T: At 10 minutes of redundancy at high bitrates, you are going to hit the minimum with any non-zero slope. There will be some period of time between 1 second and 10 minutes where you might want to stop higher in bit rate than the minimum. Speech coding enhancements draft-buethe-opus-speech-coding-enhancement (Jan Buethe) 20 min Status update Start 5:28 Opus Speech Coding Enhancement Focus on quality today Gold standard for evaluation: subjective listening test Very costly Metrics under consideration: PESQ WARP-Q MOC NOMAD Comparison to listening test results (MOS) Not perfect, but reasonable. Detecting Degredation Goal: Distinguish good from bad enhancement models All four metrics seem capable of separating good models from bad models. NOMAD seems favorable to other metrics but difficult to standardize. WARP-Q and MOC easier to standardize Next Steps: Algorithm Development Standardization Happy to take opinions Questions? Tim Terriberry: Table slide: do the metrics disagree about if there is an improvement at high bit rates? A: Yes. I believe the metrics are incorrect. Best to have a listening test at higher bit rates. Led to believe it is a shortcoming of the metric. Mo: Some people created composite metrics. Do you see that here? A: No, these are standalone. Something to look into. Jean-Marc Valin: Signal is high passed version that has passed filter that Opus uses internally, Not the only chnage the opus encoder makes to original signal. Take some care as to what the correct reference signal should be. A: The reason for taking this signal is you get face shifts. Will re-run. Tim Terriberry: Are you degrading orginal input? If Opus encoder is doing enhancment to speech, are these methods that get closer to orginal input undoing the enhancements? A: Have to check. Tim Terriberry: Should there be though? Will you be penalized if you dont undo what the Opus encoder did? JM: Only exception should be the high pass filter. Opus extension mechanism draft-ietf-mlcodec-opus-extension (Timothy Terriberry) 15 min Status update Start: 5:44 Draft Status: Published as WG draft Updates since SF Reserved ID 127 Quoted text from RFC 5576 "media-level format parameters MUST NOT be carried over blindly" Q: Jonathan Lennox: What frame is the dread associated with? A: It would be useful for it to be on the first frame Q: Jean Marc: Goal of frame separator is not for dread, but for some of the extensions we are planning. Clarified support for extension IDs 0 and 1 does not need to be explicity signaled via a=fmtp Two Future Extension Mechanisms? ID=0, L=0 ID=127 Changes Not Made Did not split out IANA registration for L=0 and L=1 modes for IDs 2...31 Did not switch to QUIC varint for extension IDs Did not reserve "unsafe" extension IDs Mo: Not specifically enamored with QUIC varints, just in general there are many variable link coatings that are popular. Worth considering other use case proposals re: QUIC varint for extension IDs Questions? Waiting for feedback, nothing currently in queue. Currently milestones say that this is going to the IESG in Dec, which seems soon. Need readers for document. Feedback needed. Done 6:00pm