[opus] Opus VAD in 1.3 (and Music/Speech detection)

M. LALMI ALL-RTP lalmi at all-rtp.com
Thu Sep 5 12:55:07 UTC 2019


I am studying different VAD (and Speech/Music detection) methods and find the one based on GRU very interesting (the one implemented in Opus 1.3).
Is there a documentation on how to calculate the vector of input features [25 elements] and a description on how the GRU was trained (RFC, Presentation, ...etc.)? (I am not able to understand all the content of the source code in analysis.c )
What happens if audio frames contain both speech and music (like in waiting music of call centers) ? will it detect speech or music ?

Thanks in advance for your help,

Best regards,


More information about the opus mailing list