[Speex-dev] Speex quality estimation in lossless media

Sun Jun 7 11:16:53 PDT 2009

Hi,

There is a lot of speex quality estimations. One of this comparative
estimation is even available on the official site
<http://speex.org/comparison/>.

I'd like to present yet another one. And I thought that the best place
for this presentation would be Speex-dev mailing list. I want to get
feedbacks and criticisms please. If Speex authors consider to make
some parts of this work public available on the official site or smth.
elsewhere I'll be just happy.

I'm ready to answer all concerning this experiment questions in the
mailing list or personally.

Below is more or less detailed description of the work.

Motivation
--------------

Currently we make a research work which main purpose is to develop an
adaptive algorithm. This algorithm tunes speech encoder parameters
depending on network media state (Speex has been chosen for this work
because of its wide tuning possibilities).  In order to correctly
implement its logic we need to get reliable speech quality
estimations. For ITU (G.729, etc) and GSM codecs these estimations has
been performed and can be obtained from the ITU-T official page.

Although there are some experiments which allow user to make an
objective comparison between Speex and other codecs, unfortunately we
can't found anything which can be considered as reliable enought for
our purposes.

That's because we perform yet another comparative experiment which
result contains Speex quality estimations along with other most
popular codecs.

Not only results but source data and source of all testing tools are
avaliable in public repository [1].  Due to that "open source nature"
of these experiments we beleive these results are enough reliable,
reproducable and thus objective.

Experiment source data and experiment description
----------------------------------------------------

During the experiment the set of source speech samples goes throught
simulation model which reproduces voice distortion during encoding and
decoding processes (throught codec in fact). After that the source
speech samples are compared with the degraded ones according to the
PESQ algorithm as defined in P.862 ITU-T recommendation. The
comparison is performed with ITU pesq utility.

Source speech samples has the length about 8-15 seconds. The samples
contain male and female voices, all sentence are pronounced in
english. All speech samples are given from internet podcast interview,
some of these has unsignificant noise artefacts. Every speech sample
has at least 0.5 seconds of the silence on the bounds. Most of speech
samples almost have no pauses inside.

PESQ estimation is performed with 8kHz samples, resulting value is MOS
LQO as defined in P.862.1.

Experiment results and further work
-------------------------------------

It's considered that everyone can reproduce the experiment results
using given source data (see link [1]). But due to the untrivialilty
of the environment deployment the plots with main results are
available in the attachment. These plots represent the mean value for
a set of experiments with given codec with a 95% confidence interval.
Note that bitrate ("X") scale is logarithmic.  Currently we propose no
interpretation of these data.

We plan to complete these experiments with ones describing
dependencies of the voice quality from network losses with different
codecs.

[1] : https://github.com/imankulov/speex_quality_evaluation/

--
Roman Imankulov
roman at netangels.ru

-------------- next part --------------
A non-text attachment was scrubbed...
Name: english_male_gsm.eps
Type: image/x-eps
Size: 22515 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20090608/77e56ff8/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: english_male_itu.eps
Type: image/x-eps
Size: 23134 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20090608/77e56ff8/attachment-0003.bin