[Speex-dev] 2 questions, frame size and SPEEX_GET_LOOKAHEAD
jpu at apple.com
Tue Oct 31 15:52:02 PST 2006
Thanks for the comments. Yes, I am aware of those issues. I probably
should have been more accurate on my usage of terms. Actually in my
project, the unit collection is a mixture of diphones and words.
However seems to me, these synthesizer specific issue is irrelevant to
my question about speex. As you said, i merely use speex as storage
methods. All I ask for is to get the samples as close to original
recording as possible after encoding and decoding. Blending, cross
fading, pitch adjustment, these signal processing issues are not a
concern at this stage.
On Oct 31, 2006, at 3:40 PM, Andras Kadinger wrote:
> [At the risk of educating you about something you might already know]
> Natural speech in most human languages gradually changes from one
> phoneme to the next.
> Concatenating phonemes together from a fixed, prerecorded,
> unflexible set would give rise to abrupt changes between them (both
> in phoneme quality and in pitch), and thus make the resulting speech
> hard to understand and/or uncomfortable to listen to.
> Most flexible (unlimited vocabulary), unit (e.g. "phoneme")
> concatenation speech synthesizers therefore use some strategy to
> blend the pieces of speech together, usually both in pitch and in
> phoneme quality. One very conceptually simple and therefore popular
> approach is storing "diphones" - phoneme transitions: e.g. the
> second half of "a" and the first half of "p" from the hypothetical
> word "apa". Since phonemes usually tend to reach their "most
> recognizable" state in the "middle", cutting and splicing them
> together around that point should minimize the amount of
> Obviously, if you concatenate speech from larger units (words,
> phrases, or even sentences) ensuring acoustical continuity becomes
> less and less of an issue, but you specifically mention phonemes.
> So unless you want to use Speex to (re)implement unit storage for a
> speech synthesizer that handles these issues, I suggest you take a
> look at the available literature on speech synthesis.
> Wikipedia seems to be a reasonable starting point: http://en.wikipedia.org/wiki/Speech_synthesis
More information about the Speex-dev