[opus] Opus VAD in 1.3 (and Music/Speech detection)

Thu Sep 5 12:55:07 UTC 2019

Hello,

I am studying different VAD (and Speech/Music detection) methods and find the one based on GRU very interesting (the one implemented in Opus 1.3).
Is there a documentation on how to calculate the vector of input features [25 elements] and a description on how the GRU was trained (RFC, Presentation, ...etc.)? (I am not able to understand all the content of the source code in analysis.c )
What happens if audio frames contain both speech and music (like in waiting music of call centers) ? will it detect speech or music ?

Thanks in advance for your help,

Best regards,

Mohamed