[vorbis-dev] Psycho-acoustics research

David Willmore davidwillmore at iamanidiot.com
Mon Apr 1 08:14:39 PST 2002



Sorry for the late reply, all, I'm a bit lagged these days.

> My idea so far is to record several speakers producing minimal pairs
> (such as 'zip' and 'sip') and to compress the sound under different
> compression schemes (Ogg, MP3, GSM, etc) at varying levels of
> compression, then play them back on decent audio equipment for
> listening tests to see if listeners can still distinguish important
> parts of the sounds in the recordings. In particular, I'm interested
> in looking at the nature of degradation when the compression ratio is
> particularly high: what phonemes become more difficult to distinguish
> soonest, as the compression ratio goes up?  And then, if possible, I'd
> like to come up with an analysis of *why* those particular sounds are
> poorly recreated, as opposed to others.  My guess is that fricative
> sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain
> larger amounts of white noise, which is often poorly handled by
> compression.

You're going to be hitting this issue from an angle, then--expect
to have some problems.  Let me clarify.  All compression systems
that you mentioned, except GSM, are music codecs and are not tuned
for speech.  Expect them to break down on isolated speech.  I would
even suggest that your time with them may be better spend elsewhere.

If you do limit your work to speech codecs, you're going to run into
the LPC-10 kind of family where, at low bit rates, the speaker
variations are stripped off and the output starts to sounds like
a robot.  These codecs were designed to work at very low bit
rates (normaly encrypted phone links) and are just intended to
get the data across, not sound nice. :)

So, if I were you, that would lead me into refining my thesis
question a bit.  Maybe take your samples, degrade them in controlled
ways--add noise, quantize frequency, frequency shift, etc.  And
test how well they're recognized.  That could be done with some
normal .WAV editing tools on a PC without much problem.

> A couple other people in my classes are interested in working with me
> on the project - one has more math background, one is willing to
> administer perception tests on a group of people.  I have the
> linguistics background.  We have until the end of April to do the
> project, and the exact idea should be decided on by the end of this
> week.

Oops, looks like I was too late.  Well, maybe next semester. :)

> I'd like to know if there's something I could do that would be more
> helpful than academic. So... got any ideas for a project within this
> scope that would directly benefit Ogg Vorbis development? Changing
> topics is possible, though it would be preferable for it to have a
> strong linguistic element so I can use the project for both classes.

Vorbis is a music codec, maybe run your tests with the words sung
and music in the background.  Test how the inteligibility of the
voice degrades with different backgrounds?  Just a thought.

Cheers,
David

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list