[Speex-dev] [SpeechIO] Increasing the speed of speex playback
sbahram at nc.rr.com
Wed Oct 20 09:21:54 PDT 2010
The only reason I can understand them is because they are at slow speeds ... The static is completely unacceptable for more than 2
seconds of speech.
From: speechio-bounces at lists.quantummyst.com [mailto:speechio-bounces at lists.quantummyst.com] On Behalf Of Bill Cox
Sent: Wednesday, October 20, 2010 9:20 AM
To: Steve Underwood
Cc: speech input and output; Vinux Development; speex-dev at xiph.org
Subject: Re: [SpeechIO] [Speex-dev] Increasing the speed of speex playback
Hi, Steve. I tried your the time_scale_tests program, and it works well! Especially for low speed changes, it's the best I've
heard so far.
For high speed increases, there is what sounds like static added to the sound output. I've attached two sound samples of high speed
speech, which is a 4X speed up of a popular TTS voice in the blind community (voxin/Eloquence). I've sped up the voice with LPC in
one case, and time_scale_tests in the other. Don't worry that you can't understand these speech samples - many blind people can,
and I can understand it, just barely, at this speed.
I guess now I need to learn about the algorithm you've used, and see if I can track down the source of the static. I've copied two
lists that have blind users who may be interested in very high speed playback of voices.
On Tue, Oct 19, 2010 at 9:37 PM, Steve Underwood <steveu at coppice.org> wrote:
> Hi Bill,
> Any attempt to alter speed by simple insert or dropping produces poor
> results. Even if you can get it to sound smooth, the resulting pitch
> shift is horrible. You really need to use a transform that alters
> speed smoothly, while maintaining the original pitch of the voice. If
> you look in my spandsp library you will find a module which does
> exactly this, using an algorithm called PICOLA. You can speed up or
> slow down a voice in fine speed steps using this module, and the
> resulting voice is almost the same quality as the original. There is a
> test program for it, which should function as an example of how you
> need to call the library to initialise and use it.
> On 10/20/2010 05:21 AM, Bill Cox wrote:
>> Here's one clue about whatever is causing the low quality speech.
>> Speech sounds terrible at 1.01X faster, and it sounds excellent at
>> normal speed (1.0X). So, the main problem is something that breaks
>> with any change in frame size in the decoder. Any idea what that
>> might be?
>> On Tue, Oct 19, 2010 at 5:14 PM, Bill Cox<waywardgeek at gmail.com> wrote:
>>> I was able to easily hack in an option to play back at different
>>> speeds. For example, using "speexdec --speed 2.0 file.enc file.wav"
>>> plays back encoded file.enc at 2X speed. What I did was divide
>>> st->frameSize and st->subFrameSize by the speedup, and added a
>>> SPEEX_SET_SPEED decoder control for the nb_celp decoder. This
>>> produced speech that was 2X faster than the original.
>>> However, the quality is very poor. This is where it gets harder for
>>> me, as the quality is impacted by so many parts of the code. Can
>>> anyone guess which part of the decoder is leading to such poor
>>> quality when I cut the frame size in half? This hack works very
>>> well in LPC10, and fairly well in MELPe.
>>> I've attached two outputs from speex: the decoded playback at normal
>>> speed, and the 2X speed version.
>> Speex-dev mailing list
>> Speex-dev at xiph.org
> Speex-dev mailing list
> Speex-dev at xiph.org
More information about the Speex-dev