[opus] Opus for ASR
Milan.Young at nuance.com
Fri Sep 14 13:09:47 PDT 2012
All of the Opus quality studies that I've seen focused on human-perceived quality. I'm interested to know of any experience with machined "perceived" quality, particularly related to speech recognition or biometrics.
I'm also interested in folks thoughts on optimizing Opus for ASR. For example, removing certain classes of comfort noise, filtering non-speech bands, tuned VAD, etc. One could imagine eventually rolling these updates back into the standard under an "ASR" mode.
A big part of optimizing for ASR will be an infrastructure that reports feedback on candidate improvements and facilitates regression testing. To that end, Nuance is willing to publish a service which allows developers to upload codec binaries to our computational grid and report back a score. If such a service is of interest to you, please let me know of any design constraints you have in mind. In particular, I'd like to know preferences in accuracy vs. latency in the service. For those of you familiar with speech recognition, you will be aware that testing involves tens and hundreds of thousands of utterances, hence my concern.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the opus