[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

Thu Mar 28 15:40:35 PST 2002

Hi Jean-Marc,

> > sampling-rate should be high (48khz), but bandwidth should be less than
> > 16khz (after "extracting" speech-only from the lingual track).
> 
> Currently, Speex only supports sampling at 8 kHz and 16 kHz, so it would
> need to be adapted to work at 32 kHz (and then up-sample to 48 kHz). I'd
> say it's quite feasible.

Speex shouldn't bother dealing with non-standard speech sampling-rates.
Encoding tools like mine would downsample the signal before delivering
it to Speex, and decoding tools like Tobias' DirectShowFilter should
take care of the upsampling and summing the different tracks (common + speech).
There's a brilliant, open-source, HQ sample-rate convertor, called 
SSRC. it's under LPGL, and i even made a dll release of this fine tool.

> 
> > about bitrate, let me describe something :
> > up until vorbis came, people used to encode their soundtrack of movies
> > at 128kbps to 192kbps MP3. now, with Ogg, we can encode the "common"
> > track at around 100kbps vorbis, and encode each speech track at less
> > than 30kbps with speex. this gives us about 180kbps for a movie with
> > three soundtracks (english/italian/francis, for instance).
> > that could make a small revolution :).
> 
> I think 30 kbps is realistic. When we add VBR, the average could easily
> drop to ~16 kbps/track.

ure thing! the speech track suppose to have lots of silent moments, so 
DTX (AD/CFI) would help to drop the bitrate.
problem is - can CELP handle multiple spokesmen (ie, when two ppl are
talking at the same time), and will sound quality differ when compressing English track compared to encoding Russian track ?

> 
> > you can find some info about MBE over at :
> > http://www.dvsinc.com/papers/mbe.htm
> 
> This info seems very biased to me...

most probably. 
still, i know of some satelite applications where AMBE is succesfuly 
used. i also know of a few projects where MELP is used over HF.

> 
> So I'd say the first step would be to build a prototype that downsamples
> the 48 kHz stream to 16 kHz and encodes it with the current Speex
> version. Once that works, we can try making Speex work at 32/48 kHz.
> Actually, that *might* not even be necessary, as most of the energy in
> speech is in the 0-8 kHz band - and even the 4-8 kHz band can in some
> cases (speech only) be severely distorted before the ear can tell the
> difference.

the first step is :
- decide how we extract the 'common' track
- define Speex integration in ogg
- start testing - taking a multilingual title, creating the 'common'
track, downsampling the 'lingual' tracks using ssrc.dll and muxing
everything to ogg stream. then doing the reversed process.., and
comparing quality.

<p>Jean-Marc, 
you have a lot of knowledge regarding speech models. can you point out
some useful sites/tools which i should check in order to implement the
first stage of 'extracting the common track' ?

keep in mind that i should take advantage of the fact that i have
multiple soundtracks that mostly (only?) differs in the speech content.

Best Regards,
Dg. http://DSPguru.doom9.net

_____________________________________________________________
Get email for your site ---> http://www.everyone.net

_____________________________________________________________
Run a small business? Then you need professional email like you at yourbiz.com from Everyone.net  http://www.everyone.net?tag

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.