[Speex-dev] sanity check

Wed Dec 13 02:40:29 PST 2006

Frame size of 320 means 320 samples, which is 640 bytes of data if 
the samples are shorts.  Speex can work with samples as shorts or 
floats - for example, see speex_encode_int vs. speex_encode.  In 
both cases the values should be signed ranging from -32768 to 32767.

I suggest trying the sampleenc.c and sampledec.c programs in the doc 
directory.  If those work, then maybe you can spot what you're doing 
wrong.  You can also try changing one thing at a time to make them 
closer to how your code works, or vice versa.  For example, encoding 
from shorts vs. floats, or how you use the speex_bits_* functions.

Tom

<khaynes at kirkgames.com> wrote:
> 
> It's working, and it's marginally intelligible, but not usable, so I 
> thought I'd post a message to make sure I'm still sane.
> 
> I have the app capturing from the mic at 16kHz. I'm using 3200 byte 
> buffers to read the captured data, which is 100 ms of 16 bit sample data 
> at 16kHz. (1600 samples)
> When I pass this data unaltered to the playback stream it plays fine 
> with maybe 200 ms lag, which is correct.
> 
> 
> Encoding: Quality 6, VBR
> 
> mResult = speex_encoder_ctl( mEncodeState, SPEEX_GET_FRAME_SIZE, & 
> mFrameSize);
> 
> The encoder returns a frame size of 320. Sanity check 1. This means 320 
> sample frames of 16 bit samples, or 620 bytes of data right?
> 
> So encoding my 100 ms, 3200 byte buffer of mic captured samples 
> compresses 5 frames of source at 640 bytes (320 samples) each.
> You were correct in that each frame of talking data compresses to around 
> 70 bytes or less, which is roughly 10:1 compression ratio, which is 
> around 320 bytes of compressed data per 100ms (3200 bytes) of capture 
> data.
> 
> I then immediately decode the 5 encoded packets to a 3200 byte buffer.
> Then I play the sound in the output stream.
> 
> The sound is quite awful, and mostly unintelligible at best, but I can 
> definately tell that what's being compressed/decompressed 'feels' 
> similar to what I'm saying into the mic, and on a rare occasion, it 
> actually is marginally intelligible.
> 
> I have checked all of the return values for errors, there are none.
> I have also tried non-VBR, and different quality settings including 10 
> but with no real difference.
> 
> I'm asuming that I'm mistaken about one or more of these figures and 
> that's where the problem lies. Please send the men in white coats.
> 
> Kirk