[opus] Opus for ASR - update and questions

Fri Nov 30 15:57:23 PST 2012

Young, Milan wrote:
> have contained redundancy).  If there is a better term (eg “segments”),
> please let me know.  If not I’ll continue using “packet”.

The official term used in RFC 6716 is "Opus frame", and there can be 
several such frames in a packet. This is distinct from a "SILK frame"... 
there can be several of these in an Opus frame. I.e., 40 and 60 ms Opus 
frames are composed of multiple 20 ms SILK frames. Any redundant frames 
exist only at the SILK layer, so while they are independent SILK frames, 
they are packaged together in a single Opus frame.

> ·Is the decision to copy information from N-1 into N binary?  In other
> words, is it ever the case that a partial copy occurs, or is it all or
> nothing decision?

LBRR is only used on active SILK frames or the SILK portion of Hybrid 
frames, and the copy is generally lower bitrate than the original frame. 
So no, it's not really "binary".

> ·Does packet N ever contain information from more than one previous
> packets (eg N-2), or is it always a single?

A 40 or 60 ms Opus frame may contain 40 or 60 ms of redundancy, which 
may cover several SILK frames. This could potentially cover multiple 
prior packets if they were, e.g., only 20 ms long (but this would be 
unusual, and the current encoder does not include redundancy right after 
a frame size change, because it greatly simplifies the implementation). 
There is no way to, e.g., always send redundancy for the previous two 
packets instead of just one previous packet.

> ·What is the limit of the redundancy provided by this feature?  Is it
> simply that every packet contains the previous packet’s data, or does it
> go further than this by sending duplicate copies of each individual
> packet (which themselves contain the previous packet data).

It is more like, "Some packets contain some of the previous packet's 
data." There are general mechanisms in RTP for sending duplicates of 
whole packets (e.g,. RFC 2198). These can be used for any payload type, 
if the RTP stack supports it. The advantage of the redundancy provided 
by Opus natively is that it does _not_ require duplication of entire 
packets, so it imposes bitrate overheads that are less than 100%.

> ·Continuing on the above, how does one trigger that maximum mode.
> Perhaps one should set expect-loss to 100%, but that has some odd
> implications. If all packets will be dropped, the situation seems
> hopeless J.

Think of 100% as "100% minus epsilon".

> ·When setting these parameters, any advice for Ethernet frame sizes?
> I’m not much of a networking guy, but it seems that the risk of failure
> would climb dramatically if a packet spanned multiple frames.

Individual Opus packets will almost always fit inside an MTU. Even 
assuming a minimum MTU equal to the minimum IPv4 datagram size (576 
bytes), with 20 ms frames you can fit over 214 kbps of audio on top of 
the RTP headers. We've actually discussed mandating the packets fit in 
the MTU in the payload draft specification (in fact I thought the text 
was already there, but I couldn't find it when I just went to look for it).