# Machine Learning for Audio Coding (mlcodec) Working Group   {#machine-learning-for-audio-coding-mlcodec-working-group}

IETF 118 Prague, CZ  
Tuesday, November 7, 2023  
17:00 - 18:00 Prague time  
08:00 - 09:00 Pacific Time  
Room: Berlin 3/4

Meeting link: https://meetecho.ietf.org/client/?session=31749  
Notes: https://notes.ietf.org/notes-ietf-118-mlcodec

Chairs: Greg Maxwell, Mo Zanaty  
Notetakers: Emily Heron  
Chat Scribes: Jonathan Lennox

## Agenda   {#agenda}

Administrivia (Chairs) 5 min  
 Note Well  
 Agenda Bash

No agenda bashing

Deep REDundancy draft-valin-opus-dred (Jean-Marc Valin) 15 min  
 Status update

Start at 5:02

Recap from last session  
 Goal: Make Opus robust to long bursts of packet loss  
 Proposal: code large amounts of redundant audio

Proposed Format:   
 Use extension code 32 - include more than just one byte  
 Offset: position of redundancy in packet  
 Decode until fewer than 8 bits remain

      Mo: Ind comment: any coupling between Opus frame sizes and these redundancy or are they independent?
      A: Completely independent. Encode in chunks of of 40 milliseconds redundancy
      M: So no coupling?
      A: Not simple, but no. 

Normative Aspects:  
 Have a normative specification for the part that converts bits into
features.

Implementation Update:  
 Improved quality from vocoder  
 Complexity reduced from 10% to 3% CPU for high loss  
 Weights down from 17 MB to 4 MB

Tim Terriberry: 4 megabytes: what fraction is is that is the bits to
feature decoder?  
A: About one megabyte. Needs to be smaller.

Open Questions

1.  SHould there be a maximum duration allowed?
    1.  technically we could do up to approc 10 min
    2.  proposal: no hard limit

2.  What are the lowest and highest useful bitrates?
    1.  currently support 10 to 100 kb/s for 1 second redun

Jonathan Lennox: Gen question: concerned what happens if I splice two
streams? Is there a way to splice redundancy history?  
A: Interesting use case, had not thought of this before

JM will consider how splicing will work. Can we actually merge the
redundancy and get redundancy for talkers A & B? This is what he needs
to think through.

Jonathan Lennox: You said that the frame sizes don't matter. If you are
encoding at frame size smaller then there is a small gap in time between
the redundancy block and the current block, which is indicated by the
offset time, and PLC should conceal it.

Mark Harris 17:22  
The Security Considerations should highlight the danger of potentially
including earlier audio that was intended to be cut out, perhaps
confidential information, that can be decoded from DRED.

A: If it was in the redundancay it was already included in the Opus
packet.

Tim: For quantizer slope, would it be useful to specify a floor higher
than the minumum?  
A: Open to that, it would be more bits. Would welcome feedback.   
T: At 10 minutes of redundancy at high bitrates, you are going to hit
the minimum with any non-zero slope. There will be some period of time
between 1 second and 10 minutes where you might want to stop higher in
bit rate than the minimum.

Speech coding enhancements draft-buethe-opus-speech-coding-enhancement
(Jan Buethe) 20 min  
 Status update

Start 5:28

Opus Speech Coding Enhancement  
 Focus on quality today  
 Gold standard for evaluation: subjective listening test  
 Very costly  
 Metrics under consideration:  
 PESQ  
 WARP-Q  
 MOC  
 NOMAD  
 Comparison to listening test results (MOS)  
 Not perfect, but reasonable.   
 Detecting Degredation  
 Goal: Distinguish good from bad enhancement models  
 All four metrics seem capable of separating good models from bad
models.  
 NOMAD seems favorable to other metrics but difficult to standardize.   
 WARP-Q and MOC easier to standardize  
 Next Steps:  
 Algorithm Development  
 Standardization

Happy to take opinions

Questions?

Tim Terriberry: Table slide: do the metrics disagree about if there is
an improvement at high bit rates?  
 A: Yes. I believe the metrics are incorrect. Best to have a listening
test at higher bit rates. Led to believe it is a shortcoming of the
metric.

Mo: Some people created composite metrics. Do you see that here?  
 A: No, these are standalone. Something to look into.

Jean-Marc Valin: Signal is high passed version that has passed filter
that Opus uses internally, Not the only chnage the opus encoder makes to
original signal. Take some care as to what the correct reference signal
should be.   
 A: The reason for taking this signal is you get face shifts. Will
re-run.   
 Tim Terriberry: Are you degrading orginal input? If Opus encoder is
doing enhancment to speech, are these methods that get closer to orginal
input undoing the enhancements?  
 A: Have to check.   
 Tim Terriberry: Should there be though? Will you be penalized if you
dont undo what the Opus encoder did?

JM: Only exception should be the high pass filter.

Opus extension mechanism draft-ietf-mlcodec-opus-extension (Timothy
Terriberry) 15 min  
 Status update

Start: 5:44

Draft Status: Published as WG draft

Updates since SF  
 Reserved ID 127  
 Quoted text from RFC 5576 "media-level format parameters MUST NOT be
carried over blindly"  
 Q: Jonathan Lennox: What frame is the dread associated with?  
 A: It would be useful for it to be on the first frame

Q: Jean Marc: Goal of frame separator is not for dread, but for some of
the extensions we are planning.

Clarified support for extension IDs 0 and 1 does not need to be
explicity signaled via a=fmtp

Two Future Extension Mechanisms?  
 ID=0, L=0  
 ID=127

Changes Not Made  
 Did not split out IANA registration for L=0 and L=1 modes for IDs
2...31  
 Did not switch to QUIC varint for extension IDs  
 Did not reserve "unsafe" extension IDs

Mo: Not specifically enamored with QUIC varints, just in general there
are many variable link coatings that are popular.

Worth considering other use case proposals re: QUIC varint for extension
IDs

Questions?

Waiting for feedback, nothing currently in queue. Currently milestones
say that this is going to the IESG in Dec, which seems soon. Need
readers for document. Feedback needed.

Done 6:00pm