[opus] Opus application_mode==AUDIO, 20ms framing issue?

Kevin Connor kevinconnor at mac.com
Mon Jun 13 05:11:33 UTC 2016

Hi Jean-Marc, 

Sorry for late reply, thanks for interest.     It's quality good for 10ms/audio,  poorer for 20ms/audio.  Quality equivalent for 10,20ms for mode=voip.  PESQ was the tool that alerted me to something of interest, but I don't trust PESQ to almost any degree!  It's good for hearing relative differences, of course, but not absolutes.    Bitrate here was 28kbps,  but I hear same thing at 32kbps.

Please find attached a zip file with the audio files, converted to .wavs for simpler listening.   Here is a cat of the README.txt.   Thanks very much!

16bit, 16kHz input wav files (ar1, ar2, ar3), content from ~50Hz to near 8kHz.
All .pcm files are 16kHz, 16bit, signed ints, little (intel) endian.

./opus_demo -e voip 16000 1 28000  -framesize 20 ~/ar1.wav ar1_20_voip.bit 
./opus_demo -d 16000 ar1_20_voip.bit ar1_20_voip.pcm

opus_demo reports version:    libopus 1.1-alpha

Using recent pesq code compiled from src, +16000 option.
( same phenomenon seen with +16000 +wb option)  

                   5ms      10ms     20ms      40ms

ar1_NN_voip       4.314    4.493    4.488     4.488
ar2_NN_voip       4.346    4.442    4.436     4.474
ar3_NN_voip       3.993    4.375    4.414     4.390

ar1_NN_audio      4.292    4.485 -> 4.313     4.313
ar2_NN_audio      4.364    4.460 -> 4.350     4.350
ar3_NN_audio      3.924    4.327 -> 4.218     4.218

Note that this size/type of pesq test is insufficient to draw ANY conclusions.
However, it is useful for drawing attention to relative differences, that
might be interesting for HUMAN LISTENING.

So the question here was, is this pesq drop from 10ms to 20ms framesize, seen in the 
case of mode=AUDIO (but not VOIP)  something REAL?  It warranted listening.

( same results, interleaved mode=VOIP,AUDIO numbers ) 

                   5ms      10ms     20ms      40ms

ar1_NN_voip       4.314    4.493    4.488*     4.488
ar1_NN_audio      4.292    4.485    4.313*     4.313

ar2_NN_voip       4.346    4.442    4.436*     4.474
ar2_NN_audio      4.364    4.460    4.350*     4.350

ar3_NN_voip       3.993    4.375    4.414*     4.390
ar3_NN_audio      3.924    4.327    4.218*     4.218

same data,  interleaved to highlight fact that drop is seen for same sentences, 
from mode=VOIP to mode=AUDIO,  for 20ms framesize.  (40ms is same processing as 20ms, I believe).

So the  that is implied:
- is there a phenomenon for mode=AUDIO that results in lower scores for 20ms in particular, but not 10ms?

Listening to the processed files (sighted), I have the following subjective opinion:

- Given: sampling rate = 16000,  bitrate = 28000.  (also replicated at 32 kbps)
- the 10ms versions (voip,audio) and the 20ms (audio) version sound "focused" and have high fidelity to the ref.
- the 20ms mode=AUDIO versions sound "hollow", "smeared", "unfocused", especially during unvoiced segments.
- example "china hit" file ar3.pcm, t=0.6s.  Very clear diff between 10ms and 20ms framesize in mode=audio.

This isn't about pesq scores -- pesq was just the "difference noticed" flag that got me to listen to some files.
I notice this same kind of de-focused sound in the same samples processed using recent opus lib in linux.
I'm not surprised at a delta between mode=voip and mode=audio for a constant framesize.  That's entirely expected.
What I'm curious about is the delta between 10ms and 20ms , for mode=audio.  

> On Jun 3, 2016, at 10:12 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Kevin,
> Are you saying that the quality is good at 20 ms and bad at 10 ms, or
> the reverse? Also, is this speech or music? What tool, what options? In
> general, it helps a lot if you post the sample (input and output).
> Cheers,
> 	Jean-Marc
> On 06/03/2016 12:48 PM, Kevin Connor wrote:
>> Hi Opus list,
>> I'm noticing a discontinuity in the quality between use of 10ms and
>> 20ms framesize for mode=AUDIO  at a bitrate of about 28000.
>> Quality drops audibly for voice signals when encoded at 20ms
>> framesize, versus quality at 10ms.   This effect is mode=AUDIO only.
>> Using mode==VOIP shows no sig. difference between 10 and 20ms framing
>> at this bitrate.      Pesq totally overreacts, as it is wont to do :)
>> However, I do hear a slight drop. Admittedly, 28kbps is a low bitrate
>> to be running in mode=AUDIO.     Is this effect known?  Is there a
>> difference in processing with audio mode between 10ms and (other
>> framesizes)?   I reckon it will go away if I throw some more bitrate
>> at it,  but wanted to understand it a bit better.
>> Thanks very much, KevinC 
>> _______________________________________________ opus mailing list 
>> opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20160613/9d820a47/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20msAudioModeQuestion.zip
Type: application/zip
Size: 1978677 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20160613/9d820a47/attachment.zip>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20160613/9d820a47/attachment-0001.html>

More information about the opus mailing list