[vorbis-dev] Psycho-acoustics research

Wed Mar 20 18:11:58 PST 2002

> My idea so far is to record several speakers producing minimal pairs
> (such as 'zip' and 'sip') and to compress the sound under different
> compression schemes (Ogg, MP3, GSM, etc) at varying levels of
> compression, then play them back on decent audio equipment for
> listening tests to see if listeners can still distinguish important
> parts of the sounds in the recordings.

I did a related paper last semester on MP3 compression effects.  I put
the paper up at http://poplar.seitz.com/~ross/mp3compression.ps.gz.
It's not a mathematical or extremely technical paper.  It was written
from the tact of doing basic frequency analysis on post-encoding speech
samples.

The goal of my project was to determine if speech sounds that had been
lossily compressed should be considered rigorous data.  My results imply
the answer is clearly "YES!" down to at least 64kbps CBR.  Interestingly
enough, if you read my paper, you see that 64kbps will sometimes
outperform 256kbps on my raw frequency analysis tests.

Unfortunately, it was an undergrad level class, and as such, some of the
software issues I ran into couldn't be solved (time constraints are a
bitch - as are the other four classes I had to pass.... ::-).  I think
if my other ideas for analysis could be worked out a lot more raw
data on frequency distortion could be obtained in the domain of speech
sounds.  (this included the oggenc crash with short files, so no Vorbis
testing could happen either).

I originally intended to revisit the research as a self-study credit -
my professor was very excited to have a student interested in working
in this area.  What's the scope and time frame for your research?  Your
findings may induce me to continue my analysis.

When you're finished, please post a copy of your paper on the web - I'd
really love to read it, as would my prof.

Feel free to contact me at any time!

Thanks,
Ross Vandegrift
ross at willow.seitz.com

<p>> In particular, I'm interested
> in looking at the nature of degradation when the compression ratio is
> particularly high: what phonemes become more difficult to distinguish
> soonest, as the compression ratio goes up?  And then, if possible, I'd
> like to come up with an analysis of *why* those particular sounds are
> poorly recreated, as opposed to others.  My guess is that fricative
> sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain
> larger amounts of white noise, which is often poorly handled by
> compression.
> 
> A couple other people in my classes are interested in working with me
> on the project - one has more math background, one is willing to
> administer perception tests on a group of people.  I have the
> linguistics background.  We have until the end of April to do the
> project, and the exact idea should be decided on by the end of this
> week.
> 
> I'd like to know if there's something I could do that would be more
> helpful than academic. So... got any ideas for a project within this
> scope that would directly benefit Ogg Vorbis development? Changing
> topics is possible, though it would be preferable for it to have a
> strong linguistic element so I can use the project for both classes.
> 
> -- 
> Chris Riddoch       | epistemological
> socket at peakpeak.com | humility
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.