[CELT-dev] Opus for audiobooks etc

Benjamin M. Schwartz bmschwar at fas.harvard.edu
Thu Nov 17 12:18:58 PST 2011


On 11/17/2011 02:41 PM, Daniel Jensen wrote:
> can people even ABX human speech at a 32 or even 
> 24kHz sample rate from speech at 48kHz, much less hear a large quality 
> difference?

Yes (but certainly not everyone can hear the difference).

Perhaps more importantly, with Opus you don't have to worry about audio
bandwidth (i.e. samplerate; 48k vs. 22050 vs ...).  Just throw in a
fullband input and set your bitrate.  If the best quality is achieved by
downsampling, the opus encoder will do that internally.

> The recent hydrogenaudio tests showed Opus CELT modes trumping the best 
> of breed high-latency codecs at 64kbps despite having only 22.5 ms 
> latency, and the SILK modes do a great job at the opposite of the 
> bitrate spectrum and can make use of larger frame sizes for those of use 
> who don't care about latency.

Yes, although the larger SILK frames are basically just 2 or 3 20ms frames
stuck together in a way that reduces packing overhead.

Inbetween the two, the hybrid mode appears
> to do better than other codecs with similar latency- but Christian 
> Hoene's results showed it losing pretty convincingly to AMR-WB+ (which 
> was able to use 4x larger frame sizes) at 32kbps. (How much of this was 
> due to the test being stereo, I wonder? Some mono tests seem to have 
> given 32kbps Opus rather high marks.)

That test was deliberately using very weird stereo, like two different
speakers saying different things in both ears.  There have also been some
improvements in the stereo encoding since then.  I wouldn't worry too much
about those results.

> For audiobook use, I don't know that the SILK modes or anything else 
> with that low of a bitrate will be good enough, and when you're storing 
> hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet 
> spot for audiobooks would be between 20 and 32 kbps, and this seems to 
> my unschooled understanding to be a region where Opus's low delay might 
> put it at a serious disadvantage.

Here's Opus at 20 kbps beating AMR-WB, and at 32 kbps getting close to
transparent (at 16 kHz samplerate, you may note):

http://www.octasic.com/en/tech/opus_audio_codec.php#Google

> Other than just being curious in general about what folks have to say 
> about audiobook use, I'm curious about one thing in particular-- how 
> feasible would it be to use larger frame sizes (e.g. matching SILK 
> mode's 60ms maximum) for Opus, especially for the hybrid mode, and what 
> would the potential for improved quality be?

Opus will be fantastic for audiobooks.

Frame size is a bit tricky in Opus.  The short version is "don't worry
about it".  In Hybrid and CELT modes, the maximum frame size is 20ms.

A slightly longer version is that 20ms frames can be combined into
"packets" up to 120ms long.  This can save about 1 byte per frame, or
about 0.4 kbps, compared to the configuration we've been testing so far
(20ms frames in 20ms packets).

This is less than 2% bitrate savings in your "sweet spot", so we haven't
been worrying about it.  The real reason for this feature is that some
transports (like RTP) have large per-packet costs, so then reducing the
number of packets can be valuable.

--Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
Url : http://lists.xiph.org/pipermail/opus/attachments/20111117/89567506/attachment-0002.pgp 


More information about the celt-dev mailing list