[vorbis] Vorbis for low bitrate speech (10-20kbps)

B.H. Anderson bhafool1 at hotmail.com
Tue Jan 7 07:27:37 PST 2003



Hi, (this is my first post here)

A previous thread, starting Date: Tue 19 Nov 2002 - 06:09:56 EST
"[vorbis] need speech and music in one"
http://www.xiph.org/archives/vorbis/200211/0142.html
expressed needs similar to mine, to encode a lengthy speech at low bitrate.

I did some tests initially in September then concluded in December, and I 
was surprised to find Vorbis to be the best solution.

I downloaded a 38 MB file of an 80-minute speech in WMA format 64kbps stereo 
(though the source was clearly mono) and realised this was a huge file that 
would be a horribly long download via a modem and was far from optimal in 
terms of file size.

I sought to transcode it to something smaller but easy-on-the-ear over an 80 
minute listen, without adding Dalek type robotic sounds to the voice. 
(Anyone remember the cult TV show 'Doctor Who'?) I had to be completely 
intelligible too - no mushed consonants.

I should point out that it was almost certainly an audience microphone 
recording of a speech using an analogue tape recorder that had been 
digitized onto an 80-minute audio CD-R, the CD-R then being ripped into WMA 
format (as demonstrated by the "Various Artists - CD1" tag on the original 
WMA.

There's tape whine throughout (except where the tape must have been flipped, 
when the whine fades out and fades up again and the speech sounds like we 
skipped a paragraph or two) and lots of reverberation from the PA system, 
and the peak frequency content is about 5.3 kHz, so it's not an ideal 
quality original, which may cause problems for some codecs.

I tested a short 25-second sample saved out of WinAMP 2.8 (using the old WMA 
plugin that still allows disk-output).

I tried outputting via WinAMP Disk Writer plugin to formats supported by the 
Windows audio codec manager, such as Windows Media v2 at various low 
bitrates in mono. All the results of compressed formats that worked at all 
came out with pretty severe artifacts.

I also tried GSM 6.10 (8kHz sampling, 13kbps) which is acceptable on mobile 
phones and not bad here, with only slight tinny harshness. One of the big 
things with GSM and other telephony codecs is minimising latency (coding 
delay), but that didn't interest me, so I'm not surprised I did better with 
a different codec.

I wasn't even going to consider RealAudio because the player crashes and 
causes Windows to hang far too frequently.

I then tried Ogg Vorbis v1.0, which I found to provide the best compromise 
by quite some margin.

Downmixed to mono and resampled to 8000 Hz, with quality -1.0, the average 
bitrate came to 10kbps, with no tinny, robotic sounds except for a slight 
hollow edge on some applause transients at about 1'30". The whole speech 
encoded to about 10.5 kbps, and is available (along with a short test 
sample, demonstrating the intelligibility) on my website:

http://members.lycos.co.uk/bhafool1#munger

The quality wasn't annoying even for long listening periods, the filesize 
coming to 6,283,484 bytes for 79 minutes 40 seconds of audio + tags = 10.5 
kbps average. (Xiph.Org libVorbis I 20020717)

<p>I've since done further tests attempting to maintain essentially all the 
quality contained in the original, even eliminating the subtle artifacts on 
the tricky applause and minimising the filesize.

Cool Edit reveals that the source microphone and acoustics limit the 
bandwidth to about 5.3 kHz, so I resampled (with pre-filter) in CoolEdit to 
11,025 Hz sampling rate to preserve frequencies up to almost 5.5 kHz.

I estimated that the artifacts were essentially absent at q -0.60 and 
encoded the whole file using WinVorbis.

The entire file, was now 11,731,766 bytes for 79 minutes 40 seconds of audio 
plus tags = 19.6 kbps average. (Xiph.Org libVorbis I 20020717) and sounds 
essentially indistinguishable from the 64 kbps WMA source file (which is the 
most original source I have and probably had ample encoding headroom to be 
perceptually transparent thanks to the 5.3 kHz frequency limit of the 
original source and the total lack of stereo separation).

I've yet to upload that version to my website, since the quality of the 10.5 
kbps version is still very good (I liken it to good AM radio) and it's half 
the size, but as some people may want to listen to the speech again and 
again, I might soon provide the alternative as the best available.

I'd suggest that various encoders (e.g. WinVorbis) may be improved by 
including options, presets or wizards for encoding speech at very low 
bitrates. At the moment, you have to be willing to use command-line options 
like --downmix and --resample to achieve such results.

Many thanks to all those who've contributed to such a versatile high-quality 
format.

Regards,

BHA

<p>_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list