[Speex-dev] Suitability of speex for use with noisy, non-voice source material?

Thu May 1 10:31:58 PDT 2008

Hello Jean-Marc,

I have completed some very basic testing with unexpectedly excellent results. I am posting this to the reflector to encourage others into similar experiments.

My experiment consisted of processing a 200-second long sample taken from a ham radio shortwave receiver, with a variety of signals (some strong, some weak - that is, weaker signals have more noise, and the noise is roughly speaking "white noise" vs. artifacts of other digital signal processing), and some in morse code. The test I ran is very non-scientific but was based on a decent cross-section of signals I'd normally find on shortwave ham radio.

The receiver's audio was sent into a generic PC sound card line input port with linux (using Ubuntu) utility "arecord" operating at 8k samples/second, 8 bits/sample, to a WAV file format. Playback (of the subsequently decoded audio back to WAV format) was on a windows machine and windows media player version 9. Testing speex involved using the sample command-line speex windows binaries to encode, then decode. As a side-note, the decoded file was always 2x the byte count of the original source file, and I did not investigate why, but this was unimportant at this test.

I tried quality levels from 0 to 8, and made no other command line parameter inputs to speexenc. speexdec was used with no command line parameters.

Source material - 64k bits/sec raw

Quality 0 - about 2.8k bits/sec encoded - unintelligible (could tell presence of voice, but no words readable) - FYI I did not try to optimize this mode by manipulating the source material further or adjusting other encoder parameters 

Quality 1 - about 4.3k bits/sec encoded - significant distortion but almost all test signals intelligible, morse code almost fully readable (could live with this)

Quality 2 - about 6.3k bits/sec encoded - mild distortion and all test signals intelligible, morse code fully readable

Quality 4 - about 8.3k bits/sec encoded - artifacts only very mild and if I was listening for them

Quality 6 - about 11.3k bits/sec encoded - barely any artifacts - I can't say "none" but practically none

Quality 8 - about 16k bits/sec encoded - no artifacts I could hear

Overall, these results are far better than I had expected or could have hoped - both in terms of audio quality level achieved at data rates suitable for the envisioned 14.4 kbps IP/PPP experimental dial-up link, and in terms of the highly granular control I found with adjusting the quality parameter of speexenc. This granular degree of control means much potential for flexibility as I try to implement the end-to-end setup.

Next steps involve getting the code compiled onto the linux machine and working out streaming mechanisms for delivery over the dial-up link.

Thanks so much for this tool - I am more than encouraged!!

Dave

> Message: 1
> Date: Mon, 28 Apr 2008 07:27:46 +1000
> From: Jean-Marc Valin 
> Subject: Re: [Speex-dev] Suitability of speex for use with noisy,
> non-voice source material?
> To: david feldman 
> Cc: speex-dev at xiph.org
> Message-ID: 
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi Dave,
>
> Sounds like Speex would be appropriate for your application. The best
> way to check would be to actually try it with the stock encoder and
> decoder (speexenc/speexdec). The conditions you list are not ideal, but
> they'll affect any speech codec. Plus, in terms of free codecs, Speex is
> definitely the only one that can do the job.
>
> Jean-Marc
>
> david feldman a ?crit :
>> Question from new subscriber -
>>
>> I'm working on a project to connect to remotely connect to a
>> short-wave receiver via a dial-up PPP/IP circuit. Turns out the
>> dial-up circuit is only stable (useful) to 14.4 kbps (faster modem
>> training produces so many link errors that the net circuit quality is
>> unusable - one end is in a remote, rural location), so looking for
>> codec that can fit within this circuit minus PPP/IP overhead
>> (probably 10 kbps net based on testing so far.) Latency is a
>> consideration so I'm looking at voice-type encoding vs. streaming
>> MP3. I was going to try use of G726 but it's not configured below 16
>> kbps so hence my resumed search for a codec.
>>
>> In my initial searching I found speex, but before I try to engineer
>> the solution, I'd like to get any advice on use of speex in with the
>> expected source material, which is likely to be noisy (static and
>> stuff mixed in with the source audio). The source audio (monaural)
>> will be pre-filtered to fit with 300-3000 Hz passband (can be
>> slightly narrower if need be), and may not always be a single voice
>> (that is, may be>1 voices interfering with the audio passband, or
>> even non-voice such as tones and other stuff that would appear in the
>> passband of the receiver.) So based on this, would I want to avoid
>> speex or proceed to experimentation? By the way, this is just for a
>> personal project, no commercial intent.
>>
>> Very tks,
>>
>> Dave wb0gaz at hotmail.com
>>

_________________________________________________________________
Make i'm yours.  Create a custom banner to support your cause.
http://im.live.com/Messenger/IM/Contribute/Default.aspx?source=TXT_TAGHM_MSN_Make_IM_Yours