[opus] Antw: Re: OPUS vs MP3

Sat Nov 4 22:05:41 UTC 2017

On 2017-11-01, Jean-Marc Valin wrote:

> I'm not sure, but my best guess would be "because MP3's window is very 
> leaky and MP3 has to waste a lot of bits in the LF because of that". 
> It could also be just the MP3 encoder being silly, or other things.

Was the original poster speaking about the SILK or the CELT derived 
mode? Because at least wrt SILK (and the rest of the LPC derived codecs) 
there is an additional explanation: perceptually matched synthesis.

If you look at things like the GSM/UMTS series codecs, they can seem 
*really* bad in a naive noise floor comparison, while sounding better at 
low bitrates and when applied to speech-like signals. That's pretty much 
the reason LPC derivatives are used in the first place, and in OPUS as 
well: the algorithm itself somewhat models human speech production, so 
that it encodes salient information not easily measured by 
simple-minded, even if highly developed and complex, spectral and/or 
statistical coding methodologies.

> Most logical explanations would be related to MP3 being bad than 
> anything else.

One obvious problem at the lowest end is its filter bank. It simply 
hasn't the resolution to model what's happening at the lowest of the low 
end, perceptually speaking. After all at LF we get the heavy attenuation 
of the cochlea, combined with *extremely* compressed pitch sensitivity. 
No sane MDCT-based coder allocates many bits there, and LPC-based ones 
can often do even better because of the pitch sensitivity side of 
things.

> Most signals have more LF energy than HF, so it's normal for the noise 
> to look like that as well. If the noise is flat, then you have too 
> much HF noise and you're wasting bits in the LF. In fact, that's 
> exactly what I'm noticing in the spectrograms that are posted.

Yes. And of course many of the fundamentals live in the LF range. It'd 
be more useful to post spectrograms which normalized any residual noise 
by the utility signal reconstructed by the codec; i.e. time-varying S/N 
ratios per band.

Of course that's still rather naïve as well. But it'd be a start at 
least.

>> When your own ears are no longer in their best possible condition, 
>> you may try a spectrogram, just to make sure you don't miss anything.
>
> Actually, that's the wrong way. Especially when the spectrogram is
> computed on a signal difference. For example, some codecs can alter the
> phase (or add a small delay) in a way that's imperceptible, and yet
> causes a large difference signal.

My favourite is what happens with ambisonics, and especially NFC-HOA. 
You're simply not *allowed* to do time-coherent detection with that 
stuff.

> The least bad way of estimating how good a codec is at very high 
> bitrate is to just measure the point where you can't ABX and assume 
> that all codecs improve by about as much per kb/s once that point is 
> reached. And that's mostly true.

That's also the reason why you can't -- as of now at least -- make do 
with just one single coding concept at all bitrates and with all utility 
signals: doing it the empirical way leads to things like OPUS which just 
happen to work much better in practice.
-- 
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2