[vorbis] Vorbis for low bitrate speech (10-20kbps)
B.H. Anderson
bhafool1 at hotmail.com
Tue Jan 7 07:27:37 PST 2003
Hi, (this is my first post here)
A previous thread, starting Date: Tue 19 Nov 2002 - 06:09:56 EST
"[vorbis] need speech and music in one"
http://www.xiph.org/archives/vorbis/200211/0142.html
expressed needs similar to mine, to encode a lengthy speech at low bitrate.
I did some tests initially in September then concluded in December, and I
was surprised to find Vorbis to be the best solution.
I downloaded a 38 MB file of an 80-minute speech in WMA format 64kbps stereo
(though the source was clearly mono) and realised this was a huge file that
would be a horribly long download via a modem and was far from optimal in
terms of file size.
I sought to transcode it to something smaller but easy-on-the-ear over an 80
minute listen, without adding Dalek type robotic sounds to the voice.
(Anyone remember the cult TV show 'Doctor Who'?) I had to be completely
intelligible too - no mushed consonants.
I should point out that it was almost certainly an audience microphone
recording of a speech using an analogue tape recorder that had been
digitized onto an 80-minute audio CD-R, the CD-R then being ripped into WMA
format (as demonstrated by the "Various Artists - CD1" tag on the original
WMA.
There's tape whine throughout (except where the tape must have been flipped,
when the whine fades out and fades up again and the speech sounds like we
skipped a paragraph or two) and lots of reverberation from the PA system,
and the peak frequency content is about 5.3 kHz, so it's not an ideal
quality original, which may cause problems for some codecs.
I tested a short 25-second sample saved out of WinAMP 2.8 (using the old WMA
plugin that still allows disk-output).
I tried outputting via WinAMP Disk Writer plugin to formats supported by the
Windows audio codec manager, such as Windows Media v2 at various low
bitrates in mono. All the results of compressed formats that worked at all
came out with pretty severe artifacts.
I also tried GSM 6.10 (8kHz sampling, 13kbps) which is acceptable on mobile
phones and not bad here, with only slight tinny harshness. One of the big
things with GSM and other telephony codecs is minimising latency (coding
delay), but that didn't interest me, so I'm not surprised I did better with
a different codec.
I wasn't even going to consider RealAudio because the player crashes and
causes Windows to hang far too frequently.
I then tried Ogg Vorbis v1.0, which I found to provide the best compromise
by quite some margin.
Downmixed to mono and resampled to 8000 Hz, with quality -1.0, the average
bitrate came to 10kbps, with no tinny, robotic sounds except for a slight
hollow edge on some applause transients at about 1'30". The whole speech
encoded to about 10.5 kbps, and is available (along with a short test
sample, demonstrating the intelligibility) on my website:
http://members.lycos.co.uk/bhafool1#munger
The quality wasn't annoying even for long listening periods, the filesize
coming to 6,283,484 bytes for 79 minutes 40 seconds of audio + tags = 10.5
kbps average. (Xiph.Org libVorbis I 20020717)
<p>I've since done further tests attempting to maintain essentially all the
quality contained in the original, even eliminating the subtle artifacts on
the tricky applause and minimising the filesize.
Cool Edit reveals that the source microphone and acoustics limit the
bandwidth to about 5.3 kHz, so I resampled (with pre-filter) in CoolEdit to
11,025 Hz sampling rate to preserve frequencies up to almost 5.5 kHz.
I estimated that the artifacts were essentially absent at q -0.60 and
encoded the whole file using WinVorbis.
The entire file, was now 11,731,766 bytes for 79 minutes 40 seconds of audio
plus tags = 19.6 kbps average. (Xiph.Org libVorbis I 20020717) and sounds
essentially indistinguishable from the 64 kbps WMA source file (which is the
most original source I have and probably had ample encoding headroom to be
perceptually transparent thanks to the 5.3 kHz frequency limit of the
original source and the total lack of stereo separation).
I've yet to upload that version to my website, since the quality of the 10.5
kbps version is still very good (I liken it to good AM radio) and it's half
the size, but as some people may want to listen to the speech again and
again, I might soon provide the alternative as the best available.
I'd suggest that various encoders (e.g. WinVorbis) may be improved by
including options, presets or wizards for encoding speech at very low
bitrates. At the moment, you have to be willing to use command-line options
like --downmix and --resample to achieve such results.
Many thanks to all those who've contributed to such a versatile high-quality
format.
Regards,
BHA
<p>_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis
mailing list