[opus] Gapless concatenation of Opus frames

Wed Nov 8 08:43:57 UTC 2017

Hi!

Short version of my question: How to produce Opus frames which can be
safely concatenated and how to embed them into a WebM file?

Long version:

I'm currently implementing a web-based audio player which streams
audio as opus/WebM using the HTML5 media source extensions. Currently,
the server decodes a set of input files to a fixed RAW audio format
(stereo, 48000 kHz) and encodes the resulting continuous RAW stream as
Opus/WebM. Having a single, uninterrupted RAW stream allows for
perfect gapless playback on the client (which only sees a single live
WebM stream), e.g. there are no interruptions whatsoever when
transitioning between continuous tracks from the same music album.

An early tech-demo of the technique can be found here [1], the source
file http_audio_server/encoder.cpp implements the relevant
opus-encoding and webm-encapsulation (but see also [2] for a condensed
version).

Now, for performance reasons I'd like to split my RAW audio into
independent blocks (say, as an example, 50 frames or 1s each), encode
these as raw Opus frames and cache them on disc ahead of time. For
each block I'd like to reset the encoder to ensure independence
between the first frame of each block and the last frames in the
previous block, e.g., using

opus_encoder_ctl(enc_ctx, OPUS_RESET_STATE)

When the client requests a certain sequence of blocks (which may
originate from various input files in (let's pretend) any order) my
goal is to (on-demand) encapsulate the pre-encoded frames as WebM and
send them to the client.

However, in early experiments [2], resetting the encoder state at the
beginning of each block and then concatenating the frames in the WebM
container leads to clearly audible gaps in the decoded WebM stream
whenever the opus encoder has been reset.

Interestingly, such artifacts are far less pronounced (if they exist
at all), if I don't explicitly reset the encoder. However, in my real
application the encoder will at least be reset implicitly (e.g. by
starting the encoding process in multiple threads for two files which
may be played consecutively).

See [2] for a MWE which expresses what I've tried to describe above.

So to rephrase my question: if it is possible at all, how can I
independently pre-encode blocks of Opus audio frames, such that I can
concatenate them during WebM muxing without audible glitches?

In advance, thank you for your help. Please let me know I anything I
wrote is unclear, or you need more information to answer my question.

Andreas

[1] https://github.com/astoeckel/http_audio_server/
[2] https://github.com/astoeckel/opus_gapless_webm/