[Speex-dev] 2 questions, frame size and SPEEX_GET_LOOKAHEAD
jean-marc.valin at usherbrooke.ca
Wed Nov 1 00:19:24 PST 2006
> However seems to me, these synthesizer specific issue is irrelevant to
> my question about speex. As you said, i merely use speex as storage
> methods. All I ask for is to get the samples as close to original
> recording as possible after encoding and decoding. Blending, cross
> fading, pitch adjustment, these signal processing issues are not a
> concern at this stage.
Well, I hope you're aware that Speex isn't stateless, so you have to be
really careful when starting the decoding at a random place.
> On Oct 31, 2006, at 3:40 PM, Andras Kadinger wrote:
>> [At the risk of educating you about something you might already know]
>> Natural speech in most human languages gradually changes from one
>> phoneme to the next.
>> Concatenating phonemes together from a fixed, prerecorded, unflexible
>> set would give rise to abrupt changes between them (both in phoneme
>> quality and in pitch), and thus make the resulting speech hard to
>> understand and/or uncomfortable to listen to.
>> Most flexible (unlimited vocabulary), unit (e.g. "phoneme")
>> concatenation speech synthesizers therefore use some strategy to blend
>> the pieces of speech together, usually both in pitch and in phoneme
>> quality. One very conceptually simple and therefore popular approach
>> is storing "diphones" - phoneme transitions: e.g. the second half of
>> "a" and the first half of "p" from the hypothetical word "apa". Since
>> phonemes usually tend to reach their "most recognizable" state in the
>> "middle", cutting and splicing them together around that point should
>> minimize the amount of discontinuity.
>> Obviously, if you concatenate speech from larger units (words,
>> phrases, or even sentences) ensuring acoustical continuity becomes
>> less and less of an issue, but you specifically mention phonemes.
>> So unless you want to use Speex to (re)implement unit storage for a
>> speech synthesizer that handles these issues, I suggest you take a
>> look at the available literature on speech synthesis.
>> Wikipedia seems to be a reasonable starting point:
> Speex-dev mailing list
> Speex-dev at xiph.org
More information about the Speex-dev