[opus] Opus for ASR

Fri Sep 14 16:15:46 PDT 2012

On Fri, Sep 14, 2012 at 1:09 PM, Young, Milan <Milan.Young at nuance.com> wrote:
> I’m interested to know of any experience with machined “perceived” quality,
> particularly related to speech recognition or biometrics.

The closest thing is the PESQ (and PEAQ) score tests, which are
computational estimates of human-perceived quality.

> I’m also interested in folks thoughts on optimizing Opus for ASR.  For
> example, removing certain classes of comfort noise, filtering non-speech
> bands, tuned VAD, etc.

Those all sound like great ideas to me.  (I would add VBR strategy to
the list.)  The converse is also true, of course: you might well want
to retrain your ASR for Opus!  Remember that Opus spans two orders of
magnitude in bitrate, mono vs. stereo, and at least two totally
different encoding algorithms.  When you don't control the encoder,
you'll have to deal with the whole variety.  When you do, you'll have
to decide which modes are worth using, and which are not.  You might
even want to maintain bitrate- and mode-specific ASR models!

>  One could imagine eventually rolling these updates
> back into the standard under an “ASR” mode.

This seems very unlikely to me.  Opus is a decoder-specified standard,
so the encoder can be modified arbitrarily without requiring
re-standardization.  It's hard to imagine anything worth doing that
would cause you to go outside the current standard.

--Ben