[Speex-dev] 2 questions, frame size and SPEEX_GET_LOOKAHEAD

Tue Oct 31 15:52:02 PST 2006

Hi, Andras,

Thanks for the comments. Yes, I am aware of those issues. I probably  
should have been more accurate on my usage of terms. Actually in my  
project, the unit collection is a mixture of diphones and words.

However seems to me, these synthesizer specific issue is irrelevant to  
my question about speex. As you said, i merely use speex as storage  
methods. All I ask for is to get the samples as close to original  
recording as possible after encoding and decoding. Blending, cross  
fading, pitch adjustment, these signal processing issues are not a  
concern at this stage.

On Oct 31, 2006, at 3:40 PM, Andras Kadinger wrote:

> [At the risk of educating you about something you might already know]
>
> Natural speech in most human languages gradually changes from one  
> phoneme to the next.
>
> Concatenating phonemes together from a fixed, prerecorded,  
> unflexible set would give rise to abrupt changes between them (both  
> in phoneme quality and in pitch), and thus make the resulting speech  
> hard to understand and/or uncomfortable to listen to.
>
> Most flexible (unlimited vocabulary), unit (e.g. "phoneme")  
> concatenation speech synthesizers therefore use some strategy to  
> blend the pieces of speech together, usually both in pitch and in  
> phoneme quality. One very conceptually simple and therefore popular  
> approach is storing "diphones" - phoneme transitions: e.g. the  
> second half of "a" and the first half of "p" from the hypothetical  
> word "apa". Since phonemes usually tend to reach their "most  
> recognizable" state in the "middle", cutting and splicing them  
> together around that point should minimize the amount of  
> discontinuity.
>
> Obviously, if you concatenate speech from larger units (words,  
> phrases, or even sentences) ensuring acoustical continuity becomes  
> less and less of an issue, but you specifically mention phonemes.
>
> So unless you want to use Speex to (re)implement unit storage for a  
> speech synthesizer that handles these issues, I suggest you take a  
> look at the available literature on speech synthesis.
>
> Wikipedia seems to be a reasonable starting point: http://en.wikipedia.org/wiki/Speech_synthesis
>
>