[CELT-dev] Opus for audiobooks etc

Daniel Jensen jensend at iname.com
Thu Nov 17 11:41:27 PST 2011

I know the focus for Opus is low delay, but I've been watching its 
development with interest because of the potential for audiobook/podcast 
use, where latency is practically irrelevant. I hear the upcoming USAC 
codec will give good results for this niche (though listening test 
results don't seem to be available to the public yet), but I also hear 
it'll be extremely patent encumbered. If Opus can do anywhere near as 
well, I think a lot of folks would be interested in using it for 
audiobooks and avoiding the patent jungle.

The only comment I've seen about use of Opus for audiobooks was jmvalin 
saying in response to someone on his blog that Opus's ability to do 
fullband would be a key advantage here. This seems kind of 
counterintuitive to me- can people even ABX human speech at a 32 or even 
24kHz sample rate from speech at 48kHz, much less hear a large quality 
difference? A number of audiobooks I've listened to have used 22kHz mp3s 
without being clearly objectionable, and in my personal use I've had 
decent results using the -voice LAME setting (downsamples to 32kHz and 
encodes as 56kbps abr).

The recent hydrogenaudio tests showed Opus CELT modes trumping the best 
of breed high-latency codecs at 64kbps despite having only 22.5 ms 
latency, and the SILK modes do a great job at the opposite of the 
bitrate spectrum and can make use of larger frame sizes for those of use 
who don't care about latency. Inbetween the two, the hybrid mode appears 
to do better than other codecs with similar latency- but Christian 
Hoene's results showed it losing pretty convincingly to AMR-WB+ (which 
was able to use 4x larger frame sizes) at 32kbps. (How much of this was 
due to the test being stereo, I wonder? Some mono tests seem to have 
given 32kbps Opus rather high marks.)

For audiobook use, I don't know that the SILK modes or anything else 
with that low of a bitrate will be good enough, and when you're storing 
hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet 
spot for audiobooks would be between 20 and 32 kbps, and this seems to 
my unschooled understanding to be a region where Opus's low delay might 
put it at a serious disadvantage.

Other than just being curious in general about what folks have to say 
about audiobook use, I'm curious about one thing in particular-- how 
feasible would it be to use larger frame sizes (e.g. matching SILK 
mode's 60ms maximum) for Opus, especially for the hybrid mode, and what 
would the potential for improved quality be?

More information about the celt-dev mailing list