[CELT-dev] Opus for audiobooks etc
Daniel Jensen
jensend at iname.com
Thu Nov 17 11:41:27 PST 2011
I know the focus for Opus is low delay, but I've been watching its
development with interest because of the potential for audiobook/podcast
use, where latency is practically irrelevant. I hear the upcoming USAC
codec will give good results for this niche (though listening test
results don't seem to be available to the public yet), but I also hear
it'll be extremely patent encumbered. If Opus can do anywhere near as
well, I think a lot of folks would be interested in using it for
audiobooks and avoiding the patent jungle.
The only comment I've seen about use of Opus for audiobooks was jmvalin
saying in response to someone on his blog that Opus's ability to do
fullband would be a key advantage here. This seems kind of
counterintuitive to me- can people even ABX human speech at a 32 or even
24kHz sample rate from speech at 48kHz, much less hear a large quality
difference? A number of audiobooks I've listened to have used 22kHz mp3s
without being clearly objectionable, and in my personal use I've had
decent results using the -voice LAME setting (downsamples to 32kHz and
encodes as 56kbps abr).
The recent hydrogenaudio tests showed Opus CELT modes trumping the best
of breed high-latency codecs at 64kbps despite having only 22.5 ms
latency, and the SILK modes do a great job at the opposite of the
bitrate spectrum and can make use of larger frame sizes for those of use
who don't care about latency. Inbetween the two, the hybrid mode appears
to do better than other codecs with similar latency- but Christian
Hoene's results showed it losing pretty convincingly to AMR-WB+ (which
was able to use 4x larger frame sizes) at 32kbps. (How much of this was
due to the test being stereo, I wonder? Some mono tests seem to have
given 32kbps Opus rather high marks.)
For audiobook use, I don't know that the SILK modes or anything else
with that low of a bitrate will be good enough, and when you're storing
hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet
spot for audiobooks would be between 20 and 32 kbps, and this seems to
my unschooled understanding to be a region where Opus's low delay might
put it at a serious disadvantage.
Other than just being curious in general about what folks have to say
about audiobook use, I'm curious about one thing in particular-- how
feasible would it be to use larger frame sizes (e.g. matching SILK
mode's 60ms maximum) for Opus, especially for the hybrid mode, and what
would the potential for improved quality be?
More information about the celt-dev
mailing list