13:00-14:30 UTC+7 Thursday, March 20, 2025, 3rd floor room Boromphimarn
3
Chairs: Greg Maxwell, Mo Zanaty
Area Director: Orie Steele
Note Taker(s): Jonathan Lennox
Agenda
Note well, agenda bash, draft status
Jean-Marc's presentation on Opus simplifications was dropped from the
agenda, but he's still working on it.
draft-ietf-mlcodec-opus-extension
Jonathan: Do you repeat a frame separator with a non-zero increment?
Tim: If you have an increment, then you're in a new frame, so "repeat
these instructions" only repeats instructions in the current frame.
Mo: I suggest adding a sentence clarifying this.
Jonathan: Is this ready for WGLC?
Tim: We never came to a good conclusion for extension numbering for
long and short extensions; I wasn't happy with my own proposal on those
lines, it had too much complexity; I'm not proposing anything unless
there's a compelling use case.
Mo: We'll ask the list and make sure everyone's ok with that direction.
Is that the only open issue?
Tim: There's possible wordsmithing improvements, Mark Harris has given
some, extension vs. extension payload vs. extension ID is confusing so
it's worth doing some work on that, but it won't change the substance of
the document
Mo: Do you think we have enough extensions to flesh out the mechanism
that we need?
Tim: I haven't thought of any other use cases.
Jean-Marc: I can't think of anything else we need
Action: After Tim publishes the update for the repeat-these-extensions
tweak and other editorial changes, we'll submit this for WGLC.
Mo: How long will you need for this?
Tim: A month
Mo: So mid-April
Tim: Sounds good
Mo: We'll submit for WGLC mid-April, well before Madrid
Mo: You're documenting the training procedures outside of the spec?
Jean-Marc: Currently it's in the repo, I think eventually it should go
in the draft, but I'm not exactly sure where
Mo: For an IETF standard, I think we need something in the draft that
defines this, a pointer to an external repo probably won't fly
Jonathan: This sounds like an Appendix to me
Mo: Implementors will need this?
Jean-Marc: No, this is just documenting how we generated the weights.
Once they're generated the standard is the weight. Using different
weights won't be interoperable.
Mo: Does the spec already say you must use the exact binary weights
provided, and not train your own model?
Greg Maxwell: Quantizations of the official weights will work, right?
Jean-Marc: Yes. There's a few values that can't be changed at all,
they're in the draft not the binary, the rest can be quantized.
Mo: Does the spec say that?
Jean-Marc: The spec doesn't talk about the threshold for quantization,
we haven't discussed that, but it does say that the weights are the spec
Mo: What format are the weights?
Jean-Marc: They were trained as fp32 but truncated to int8
Mo: It'd be good to say that, they were truncated with no loss of
quality
Jean-Marc: An implementator could probably truncate to 7 bits if they
wanted, if they had specific hardware or something
Mo: I think both the fp32 and int8 weights should be provided to
implementors. We still need to decide where they're archived and
referenced
Mo: Would the test vectors be independent of the vocoder?
Jean-Marc: Yes, this is the input to the vocoder
Mo: Where would you get the test vectors from?
Jean-Marc: We'll just make them up
Mo: I see a few nods in the room, sounds like another appendix
Jean-Marc: Should the tests be loose or tight?
Greg: I'd like to see what the effects are of quantization are, how it
sounds -- if the quantization sounds good we should be wide enough to
accomodate it, if not we shouldn't
Jean-Marc: I'll run the experiment
Mo: Can you also do fp4
Jean-Marc: I think we'll break down at that point
Greg: You might be able to partially quantize some parameters at fp4
but that's a lot of research
Jean-Marc: That's why I'm debating whether you MUST use the weights and
also pass the test vectors, or just pass the test vectors
Greg: We can guess what implementors will do - lower precision or zero
them out. We should see if it still sounds good
Mo: Is this an informative appendix, or part of being compliant?
Jean-Marc: The test vectors should definitely be normative, but I'm
still debating whether to use this weights. But you could train on just
the test vectors, so we need more requirements than just passing the
vectors. So I think it should be both MUST.
Mo: You'll do this update soon?
Jean-Marc: I think I can come up with some vectors soon. If you have to
use the weights, the test vectors aren't as crucial
Mo: So we'll add this, and it'll be a normative part of the spec, but
the binary weights will still be normative as well
Mo: What's your preferred path on defining the vocoder?
Jean-Marc: Probably saying that features on the vocoder output must
match input, but I haven't tried it yet.
Mo: I suspect we won't get many other implementors providing their own
input, you think you'll go with your preferred path?
Jean-Marc: If it works
Benson Muite: I'm curious if you'll considered different quality levels?
Speech or music?
Jean-Marc: There's no music support in DRED, it's only speech
Tim: Feature analysis on the vocoder output sounds hard to me, I'll be
surprised if it works
Jean-Marc: I'll try it and see if it works, and report back at the next
meeting
Tim: I think we should be taking advantage of the fact that we have two
vocoders. LPCnet and Fargan. That's probably more diversity than we'll
get from anywhere else before publishing. I'm dubious whether you can
swap in one for the other and get the same features. It'll be hard to
define what "match" means
Jean-Marc: I'll try to come up with something
Mo: Is there any existing body of work on vocoder performance
assessment
Jean-Marc: What I'm trying to standardize is not so much the quality of
the vocoder as the features, that we don't end up where different
vocoders produce different features, and people start encoding different
features
Mo: What is your thought on timeline to decide this, document it, and
close it?
Jean-Marc: I can't guarantee it but I'm hoping by next IETF
Mo: Do we definitely need to specify this?
Jean-Marc: I think we need to specify features at least
Ahmed Mustafa: Is it possible that because of improvements in machine
learning that new features will come up?
Jean-Marc: We could come up with better encodings or better features,
but we need to standardize something, to ensure interoperability
Ahmed: So improvements will be better training and models, but features
will be the same?
Jean-Marc: exactly
Mo: Offline Jean-Marc and I have discussed whether we need to evaluate
the quality of what's output by DRED
Jean-Marc: If people have the resources to do listening tests, it'd be
good to see if we're improving quality
Mo: That'd be good input from the working group, if you're not working
closely on the codec it's hard to contribute to these weights features,
but anyone with audio expertise should be able to evaluate speech
quality
Mo: So go with your preferred option, or heed Tim's warning and go with
the one of the others, and please try get this in before the next
meeting
draft-ietf-mlcodec-opus-speech-coding-enhancement
No presentation about updates there, author had a conflict and couldn't
attend even remotely, but he reported there haven't been any substantive
changes for the WG version of the document. There has been work ongoing
that isn't in the draft yet, hopefully we'll get those in an -01
draft-valin-opus-scalable-quality-extension
Mo: Are you doing 2kHz linear bands all the way up just for simplicity?
Wouldn't it make more sense to have wider bands further up?
Jean-Marc: The reason for wider bands is psychoacoustic, there's no
psychoacoustic at these frequencies, and encoding these rates is very
high bitrates so saving from wider bands won't help much
Jonathan: Have you tested the quality of the base encoder
Jean-Marc: We're not specifying the encoder but our implementation
works. The decoder implementations are the same for the existing test
vectors.
Greg: Does the quality closeness here mean that we can do the tests with
accuracy?
Jean-Marc: Yes, this can be tighter than the original opuscompare
Jean-Marc: Wanted to ask the WG whether this should be adopted
Mo: If there are no objections to this work I'd like to start a WG
adoption call
Jonathan: Does the stereo flag in the extension mean you can extend mono
with stereo?
Jean-Marc: No, but stereo can be encoded differently