[Speex-dev] Speex-with-header-byte and Google ASR

Fri Feb 24 18:56:30 PST 2012

Greetings list,

I am working on a project on which we wish to use Speex with Google Automatic Speech
Recognition (ASR) to transcribe Speex audio being sent on to Google ASR service and return
us the text of the spoken audio in the Speex audio stream.  However, Google ASR's Speex
support requires the off-standard Speex-with-header-byte format, and my group cannot find
any worthwhile documentation on how we should properly encode that format.

For educational value, we have initially referred to the following blog post, which mostly
focuses on using FLAC for Google ASR:
    <http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/>
That article *does* mention the following project on GitHub which can write successfully
a Speex-with-header-byte format file that we have confirmed to some degree that Google ASR
will accept and render text of spoken audio:
    <https://github.com/QXIP/Speex-with-header-bytes>

However, we have a chunk of our own code which attempts to duplicate that project in a new
way, specifically for a Cocoa/Objective-C application, and unfortunately, it does not yet
seem to yield data that Google ASR is willing to accept (we get "Bad Data" errors back if
we send this data to them).  I am permitted by my group to share with you the following
body of code:

CODE BELOW:
SpeexRecorder::SpeexRecorder()
{
   mFileCount = 0;
   mRecordPacket = 0;
   mRecordData = NULL;
   mAudioStreamer = NULL;
   int sampling_rate = 16000;
   memset(&bits_, 0, sizeof(bits_));
   speex_bits_init(&bits_);
   encoder_state_ = speex_encoder_init(&speex_wb_mode);
   speex_encoder_ctl(encoder_state_, SPEEX_GET_FRAME_SIZE, &samples_per_frame_);
   int quality = kSpeexEncodingQuality;
   speex_encoder_ctl(encoder_state_, SPEEX_SET_QUALITY, &quality);
   int vbr = 1;
   speex_encoder_ctl(encoder_state_, SPEEX_SET_VBR, &vbr);
   memset(encoded_frame_data_, 0, sizeof(encoded_frame_data_));
}
SpeexRecorder::~SpeexRecorder()
{
   speex_bits_destroy(&bits_);
   speex_encoder_destroy(encoder_state_);
}
void SpeexRecorder::WriteToFile(int16 * buf, int count)
{
	NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
	count -= (count % samples_per_frame_);
	for (int i = 0; i < count; i += samples_per_frame_)
	{
		speex_encode_int(encoder_state_, (spx_int16_t*)buf, &bits_);
		int frame_length = speex_bits_write(&bits_, encoded_frame_data_ + 1, kMaxSpeexFrameLength);
		encoded_frame_data_[0] = static_cast<char>(frame_length);
		speex_bits_reset(&bits_);
		NSUserDefaults *defs = [NSUserDefaults standardUserDefaults];
		NSData *dataToSend = [NSData dataWithBytes:encoded_frame_data_ length:frame_length];
		NSArray *array = [NSArray arrayWithObjects:dataToSend, [defs objectForKey:@"inLang"], nil];
		NSLog(@"WriteToFile -> dataToSend: [%d]", [dataToSend length]);
		[mAudioStreamer performSelectorOnMainThread:@selector(sendDataToServer:) withObject:array waitUntilDone:YES];
   	}
	[pool drain];
}
void SpeexRecorder::OpenNextFile()
{
	mFileCount++;
	NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
	NSUserDefaults *defs = [NSUserDefaults standardUserDefaults];
	if (mAudioStreamer)
	{
		// send 0 bytes to stream to signify the end
		// stream will be closed, causing http chunking to end and google to respond
		NSData *dataToSend = [NSData dataWithBytes:0 length:0];
		NSLog(@"OpenNextFile -- #%d# -- [%d] bytes", mFileCount, [dataToSend length]);
		NSArray *array = [NSArray arrayWithObjects:dataToSend, [defs objectForKey:@"inLang"], nil];
		[mAudioStreamer performSelectorOnMainThread:@selector(sendDataToServer:) withObject:array waitUntilDone:YES];
	}
	else
	{
		mAudioStreamer = [[AudioStreamer alloc] init];
		NSLog(@"OpenNextFile -- #%d# -- [%d] bytes", mFileCount, 0);
		[mAudioStreamer performSelectorOnMainThread:@selector(_setupConnection:) withObject:[defs objectForKey:@"inLang"] waitUntilDone:YES];		
	}
	[pool drain];
}
CODE ABOVE:

Is there anything obvious in my code here that we may have missed?  I greatly appreciate
any help you can offer, I apologize in-advance if this is the wrong place to post such
messages, and finally I understand that this off-standard (likely) way of encoding Speex
may not be supportable by the members viewing this list and place no particular weight on
lack of response or lack of ability for you kind folks to help us with this problem.

Thanks in-advance for your time and willingness to consider our situation!
--Quinn Ebert

PS: My apologies if two copies of this e-mail are received.  I tried to send this ahead of receiving my subscription confirmation e-mail due to that e-mail taking about an hour to arrive. :-(
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20120224/e114d05b/attachment-0001.htm