<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Jean-Marc, <br class=""><div class=""><div class=""><br class=""></div><div class="">Sorry for late reply, thanks for interest. It's quality good for 10ms/audio, poorer for 20ms/audio. Quality equivalent for 10,20ms for mode=voip. PESQ was the tool that alerted me to something of interest, but I don't trust PESQ to almost any degree! It's good for hearing relative differences, of course, but not absolutes. Bitrate here was 28kbps, but I hear same thing at 32kbps.</div><div class=""><br class=""></div><div class="">Please find attached a zip file with the audio files, converted to .wavs for simpler listening. </div><div class=""><br class=""></div><div class=""> <a href="https://www.dropbox.com/s/bzu4i3dmg5f91tv/20msAudioModeQuestion.zip?dl=0" class="">https://www.dropbox.com/s/bzu4i3dmg5f91tv/20msAudioModeQuestion.zip?dl=0</a></div><div class=""><br class=""></div><div class=""></div><div class="">If there is one single thing to listen to, it would be </div><div class=""><br class=""></div><div class=""> ar3_20_audio.wav, loop the section "china hit" starting t=0.6s and listen for artifacts in the unvoiced speech. reference is ar3.wav.</div><div class=""><br class=""></div><div class="">and by comparison</div><div class=""> </div><div class=""> ar2_10_audio.wav ( same segment, sounds more like the reference ar3.wav)</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class="">Here is a cat of the README.txt. Thanks very much!</div><div class=""><br class=""></div></div><div class=""><br class=""></div><div class=""><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">16bit, 16kHz input wav files (ar1, ar2, ar3), content from ~50Hz to near 8kHz.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">All .pcm files are 16kHz, 16bit, signed ints, little (intel) endian.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">./opus_demo -e voip 16000 1 28000 -framesize 20 ~/ar1.wav ar1_20_voip.bit </div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">./opus_demo -d 16000 ar1_20_voip.bit ar1_20_voip.pcm</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">opus_demo reports version: libopus 1.1-alpha</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;">Using recent pesq code compiled from src, +16000 option.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">( same phenomenon seen with +16000 +wb option) </div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;"> 5ms 10ms 20ms 40ms</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar1_NN_voip 4.314 4.493 4.488 4.488</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar2_NN_voip 4.346 4.442 4.436 4.474</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar3_NN_voip 3.993 4.375 4.414 4.390</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar1_NN_audio 4.292 4.485 -> 4.313 4.313</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar2_NN_audio 4.364 4.460 -> 4.350 4.350</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar3_NN_audio 3.924 4.327 -> 4.218 4.218</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">Note that this size/type of pesq test is insufficient to draw ANY conclusions.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">However, it is useful for drawing attention to relative differences, that</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">might be interesting for HUMAN LISTENING.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">So the question here was, is this pesq drop from 10ms to 20ms framesize, seen in the </div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">case of mode=AUDIO (but not VOIP) something REAL? It warranted listening.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">( same results, interleaved mode=VOIP,AUDIO numbers ) </div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;"> 5ms 10ms 20ms 40ms</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar1_NN_voip 4.314 4.493 4.488* 4.488</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar1_NN_audio 4.292 4.485 4.313* 4.313</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar2_NN_voip 4.346 4.442 4.436* 4.474</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar2_NN_audio 4.364 4.460 4.350* 4.350</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar3_NN_voip 3.993 4.375 4.414* 4.390</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">ar3_NN_audio 3.924 4.327 4.218* 4.218</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">same data, interleaved to highlight fact that drop is seen for same sentences, </div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">from mode=VOIP to mode=AUDIO, for 20ms framesize. (40ms is same processing as 20ms, I believe).</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">So the that is implied:</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">- is there a phenomenon for mode=AUDIO that results in lower scores for 20ms in particular, but not 10ms?</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">Listening to the processed files (sighted), I have the following subjective opinion:</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">- Given: sampling rate = 16000, bitrate = 28000. (also replicated at 32 kbps)</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">- the 10ms versions (voip,audio) and the 20ms (audio) version sound "focused" and have high fidelity to the ref.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">- the 20ms mode=AUDIO versions sound "hollow", "smeared", "unfocused", especially during unvoiced segments.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">- example "china hit" file ar3.pcm, t=0.6s. Very clear diff between 10ms and 20ms framesize in mode=audio.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">This isn't about pesq scores -- pesq was just the "difference noticed" flag that got me to listen to some files.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">I notice this same kind of de-focused sound in the same samples processed using recent opus lib in linux.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">I'm not surprised at a delta between mode=voip and mode=audio for a constant framesize. That's entirely expected.</div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo;">What I'm curious about is the delta between 10ms and 20ms , for mode=audio. </div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div><div class="" style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br class=""></div></div></div></body></html>