[speex-dev] de-essing into speex?

Jean-Marc Valin Jean-Marc.Valin at USherbrooke.ca
Fri Dec 5 10:22:53 PST 2003


Hi,

I think I see what you mean, though I haven't been able to listen to
your wma file (not everyone has a wma decoder). The problem probably
only lies in the VBR tuning for wideband which hasn't received much work
yet. One way to check that is to encode in constant bit-rate and see
what the results are. I'm pretty sure you'll notice the problem appears
only at (CBR) quality 5 or below. 

        Jean-Marc

Le ven 05/12/2003 à 12:56, Olav a écrit :
> thanks for getting back to me,
> 
> i have uploaded a zip file containing some sound files that
> demonstrates the issue.
> 
>   http://www.bogus.net/~olav/ess.zip
> 
> this contains
> 
>   s.mp3   original wav file (mono) converted to top-quality mp3 (370K)
>   s.wma   windows media encoder with 19khz voice compression    ( 62K)
>   s-2.spx speexenc --vbr --quality 2 on the wav file            ( 63K)
>   s-9.spx --quality 9                                           (197K)
> 
> plus quality 3, 4, 5, 6, 7 and 8.
> 
> the contents of the file is a norweigan sentence from a record
> containing a lot of ess sounds, repeated 10 times or so, just to get
> some file size so file size comparison makes sense.
> 
> one may argue on which compression the ess sounds become
> acceptable. after listening MANY times between the original and the
> spx file, i decided that going under quality 9 means you start to hear
> "computerish" ess sounds.
> 
> as for the speex VS windows media encoder issue, compare speex quality
> 2 with the wma file. they are equally sized and should therefore be of
> equal quality, but in my ears the wma file is quite a lot better. it
> may have less treble, but the spx file sounds very synthetic.
> 
> note: if i have used speexenc incorrectly please let me know.
> 
> the wav file was 2MB so i didn't want to include that, but simply use
> lameenc etc to decode the mp3 file into wav if you want to do testing.
> 
> i hope to hear from you soon. i find this issue very interesting.
> 
> olav
> 
> > From: "Tony & Amanda Benik" <benikajal at mcihispeed.net>
> > Date: Thu, 4 Dec 2003 23:47:39 -0600
> > 
> > Representative of Olav,
> > 
> > >like if you say "someone said the sun is shining", there is a lot of
> > >ess sounds, and these will sound "computer-ish" at vbr qualities below
> > >9.
> > 
> >   I don't mean to be rude but what bit rate is windows media encoder
> > encoding at and what encoder (type) are you using...  Unless its low
> > (32kbps-8kbps) it doesn't compare to speex (spx).  The "ess" sound
> > you are hearing are most likely generated because the entire frame
> > (bit of sound) has been striped of all but it most mathematically
> > pure and simplest (smallest) representation.
> > 
> >   I know a bit about text2speech and speech2text, and though a de-ess
> > filter on the speex decoder would be 'pleasant' to the human ear
> > (if one finds pure tones unpleasant rather than unhuman).  It would
> > make subsequent mixing and encoding of speex streams (VoIP phone
> > lines) less effective and more costly in a resource sense.
> > 
> >   It is a good idea, though I would consider a luxury filter, that's
> > just me being overly assertive.
> > ||
> > \/
> > 
> >   If anyone is interested from my knowledge of speech recognition all
> > human phonemes when converted from power vs. time to power vs. freq
> > exibit 2 characteristic spikes.  The primary spike defines the base
> > for recognizing the phoneme and the next highest spikes relative
> > location and power give a program a good probability match as to
> > which phoneme it is.
> > 
> > Humanizing spx audio derived solely from pure human voices could be
> > accomplished by reconstructing the secondary peak but would introduce
> > a minimum latency far larger than several frame sizes (ie the length
> > of a human phoneme i.e. vowel consonant).
> > 
> > The filter also will most likely foul up the speech alittle cause
> > like most voice recognition software it can guess wrong an 
> > reconstuct the wrong secondary peak onto the frames.  (I'm guessing)
> > 
> > The filter also will most likely eat up a lot of cpu power like most
> > voice recognition software.  (I'm guessing)
> > 
> > ==
> > 
> > To conclude:
> >   I may be very wrong so please correct me but I am dilligent to keep
> >   up on these things.
> > 
> > -- Benikus Rex
> > --- >8 ----
> > List archives:  http://www.xiph.org/archives/
> > Ogg project homepage: http://www.xiph.org/ogg/
> > To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
> > containing only the word 'unsubscribe' in the body.  No subject is needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
> > 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.

-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20031205/ee2b5912/signature.pgp


More information about the Speex-dev mailing list