[speex-dev] de-essing into speex?
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Fri Dec 5 10:22:53 PST 2003
Hi,
I think I see what you mean, though I haven't been able to listen to
your wma file (not everyone has a wma decoder). The problem probably
only lies in the VBR tuning for wideband which hasn't received much work
yet. One way to check that is to encode in constant bit-rate and see
what the results are. I'm pretty sure you'll notice the problem appears
only at (CBR) quality 5 or below.
Jean-Marc
Le ven 05/12/2003 à 12:56, Olav a écrit :
> thanks for getting back to me,
>
> i have uploaded a zip file containing some sound files that
> demonstrates the issue.
>
> http://www.bogus.net/~olav/ess.zip
>
> this contains
>
> s.mp3 original wav file (mono) converted to top-quality mp3 (370K)
> s.wma windows media encoder with 19khz voice compression ( 62K)
> s-2.spx speexenc --vbr --quality 2 on the wav file ( 63K)
> s-9.spx --quality 9 (197K)
>
> plus quality 3, 4, 5, 6, 7 and 8.
>
> the contents of the file is a norweigan sentence from a record
> containing a lot of ess sounds, repeated 10 times or so, just to get
> some file size so file size comparison makes sense.
>
> one may argue on which compression the ess sounds become
> acceptable. after listening MANY times between the original and the
> spx file, i decided that going under quality 9 means you start to hear
> "computerish" ess sounds.
>
> as for the speex VS windows media encoder issue, compare speex quality
> 2 with the wma file. they are equally sized and should therefore be of
> equal quality, but in my ears the wma file is quite a lot better. it
> may have less treble, but the spx file sounds very synthetic.
>
> note: if i have used speexenc incorrectly please let me know.
>
> the wav file was 2MB so i didn't want to include that, but simply use
> lameenc etc to decode the mp3 file into wav if you want to do testing.
>
> i hope to hear from you soon. i find this issue very interesting.
>
> olav
>
> > From: "Tony & Amanda Benik" <benikajal at mcihispeed.net>
> > Date: Thu, 4 Dec 2003 23:47:39 -0600
> >
> > Representative of Olav,
> >
> > >like if you say "someone said the sun is shining", there is a lot of
> > >ess sounds, and these will sound "computer-ish" at vbr qualities below
> > >9.
> >
> > I don't mean to be rude but what bit rate is windows media encoder
> > encoding at and what encoder (type) are you using... Unless its low
> > (32kbps-8kbps) it doesn't compare to speex (spx). The "ess" sound
> > you are hearing are most likely generated because the entire frame
> > (bit of sound) has been striped of all but it most mathematically
> > pure and simplest (smallest) representation.
> >
> > I know a bit about text2speech and speech2text, and though a de-ess
> > filter on the speex decoder would be 'pleasant' to the human ear
> > (if one finds pure tones unpleasant rather than unhuman). It would
> > make subsequent mixing and encoding of speex streams (VoIP phone
> > lines) less effective and more costly in a resource sense.
> >
> > It is a good idea, though I would consider a luxury filter, that's
> > just me being overly assertive.
> > ||
> > \/
> >
> > If anyone is interested from my knowledge of speech recognition all
> > human phonemes when converted from power vs. time to power vs. freq
> > exibit 2 characteristic spikes. The primary spike defines the base
> > for recognizing the phoneme and the next highest spikes relative
> > location and power give a program a good probability match as to
> > which phoneme it is.
> >
> > Humanizing spx audio derived solely from pure human voices could be
> > accomplished by reconstructing the secondary peak but would introduce
> > a minimum latency far larger than several frame sizes (ie the length
> > of a human phoneme i.e. vowel consonant).
> >
> > The filter also will most likely foul up the speech alittle cause
> > like most voice recognition software it can guess wrong an
> > reconstuct the wrong secondary peak onto the frames. (I'm guessing)
> >
> > The filter also will most likely eat up a lot of cpu power like most
> > voice recognition software. (I'm guessing)
> >
> > ==
> >
> > To conclude:
> > I may be very wrong so please correct me but I am dilligent to keep
> > up on these things.
> >
> > -- Benikus Rex
> > --- >8 ----
> > List archives: http://www.xiph.org/archives/
> > Ogg project homepage: http://www.xiph.org/ogg/
> > To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
> > containing only the word 'unsubscribe' in the body. No subject is needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
> >
>
> --- >8 ----
> List archives: http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body. No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
--
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20031205/ee2b5912/signature.pgp
More information about the Speex-dev
mailing list