[Speex-dev] added background noise problem?

Monty xiphmont at xiph.org
Mon Sep 6 10:25:39 PDT 2004

On Mon, Sep 06, 2004 at 10:04:46AM -0500, Scott Roberts wrote:
> Using narrow, wideband, and ultra-wideband encoding on a short 16khz wav
> gave .spx's of 3,789 ... 2,935  ... and 1,875 bytes. Even after reading the
> manual, smaller files for the higher frequency encoding seems
> counter-intuitive.  
> My mp3 at 32 kbps on the original 22khz wav is 3,866 with a quality
> comparable to speex wideband on the converted 16khz wav, so speex is a 24%
> improvement in size. The mp3 had a little more (but acceptable) hiss, but
> it had better sharpness in the higher frequencies (hence the hiss).
> Overall i rank the quality even.  I tried LAME mpeg-2 layer 3 on the
> converted 16khz wav, but i still had to go up to 32 kbps (same file size)
> to get the same quality as speex wideband.  So all in all, i estimate speex
> to have a 25% better quality/size ratio than mp3 on high quality speech.
> Maybe a speex expert could do better settings, but maybe so too could mp3.
> But maybe implementation by a novice user is part of a valid comparison
> even if my methodology isn't perfect.

The type of artifacts introduced by speex and mp3 will be quite
different.  MP3 will lose harmonic precision and thus voice will get
overly 'crisp' when what's really happening is that it's badly
smearing impulse coherency.  Speex will always render the foreground
speech nearly flawlessly, but anything in the background will
decompose into garble.  That's what any speech codec does; render
a single foreground voice and nothing else.  Ever wonder why hold music
falls apart on a cellphone? :-)

This is why most speech encoders will also include a filter stage that
tries to suppress as much background noise as possible.  Try a gentle
but deep multiband expander on your input WAVs; I expect speex's
performance will pick up markedly.  If you don't have access to such a
tool, I can demo this for you using the Postfish.

In any case, in your testing, it may simply be the case that you're
particularly sensitive to one form of artifact or the other.  Many
folks who have used mp3 for years have become completely deaf to mp3's
flaws because they're second-nature; others (like me) become overly
sensitive to them.  When using a speech codec for the first time, it's
not surprising to be overly sensitive to novel artifacts.


More information about the Speex-dev mailing list