[opus] Gapless concatenation of Opus frames

Andreas Stöckel astoecke at uwaterloo.ca
Thu Nov 23 05:38:36 UTC 2017


as promised I've implemented the cross-fading approach on top of
LPC-generated lead-in/lead-out frames. Cross-fading information is
stored in the OpusTags of each segment and processed by the JavaScript
client. This seems to work quite well even with very short overlap (1ms).

An online demo which uses 1.5s segments and 250ms overlap can be found at


The source code is available at


Again, thank you for the fruitful discussion,

On 2017-11-16 06:15 AM, Jean-Marc Valin wrote:
> Actually, cross-fading will work even better than what libopusenc does.
> The reason I did not do it is because the Ogg Opus spec provides a
> preskip, but no crossfade option. This means you will not be able to get
> standard players to play your files (which may be OK).
> BTW, there may be a way to implement what libopusenc does in parallel.
> All you'd need to do is start each parallel chunk with the keyframe and
> then at the end, also include a copy of the keyframe at the end of the
> previous chunk. You'd need to be careful about adjusting the delays and
> all, but it should be doable. IOW, it should be possible to reset the
> encoder before the keyframe, provided you pre-fill it correctly.
> Cheers,
> 	Jean-Marc
> On 11/16/2017 01:00 AM, Andreas Stöckel wrote:
>> Hi all,
>> I finally understand how lipopusenc is capable of producing chainable
>> Opus files (in contrast to my program), and I managed to successfully
>> implement the method [1].
>> Essentially, the last frame of a file is marked as a "keyframe" by
>> disabling prediction for this frame in libopus. This encoded keyframe
>> frame is then copied in verbatim to the next file, with the pre-skip
>> set to the frame length. The encoder is not reset, and reading the
>> keyframe will bring the decoder to the correct state in which it is
>> able to correctly decode the following frames. See [2] for a plot of
>> the resulting decoded frames/segments.
>> Unfortunately, this method doesn't really work for the application I
>> have in mind. Essentially, what I want is to create the individual
>> audio segments independently. That is, I'd essentially like to encode
>> the segments in an arbitrary order (or in parallel) and still be able
>> to concatenate them later on. In other words: the previous segment may
>> not have been encoded when I'm encoding a certain segment.
>> I guess that means that I'd like the encoder to be in a deterministic
>> state at the segment boundaries, and that's probably not possible
>> without actually encoding the entire signal up to the segment boundary
>> (which is not an option).
>> I thought about Ulrich's suggestion regarding injection of
>> deterministic header frames (and possibly forcing the encoder to a
>> certain state by copying the OpusEncoder structure), but I couldn't
>> come up with a scheme where that would eliminate gaps.
>> My last resort is to experiment with 13.5ms overlap between the
>> segments and crossfading the first/last frame instead of just
>> concatenating. Will report whether this works. My guess is that this
>> won't cause clicks, but instead cause possibly audible distortions.
>> Sorry for my general confusion, I've never used an audio codec at such
>> a low level before.
>> Cheers,
>> Andreas
>> [1]
>> https://github.com/astoeckel/opus_gapless/tree/2e664a81f1ce852183995971a3d26b61b676aa09
>> [2] https://somweyr.de/opus/opus_chaining_working.pdf
>> On 2017-11-13 04:24 PM, Jean-Marc Valin wrote:
>>> Hi Andreas,
>>> Considering you're switching to Ogg, I think you should give libopusenc
>>> a try. It does a really good job at getting rid of *all* discontinuities
>>> -- to the point where you can chop a song into files less than one
>>> millisecond each and it still sounds good. It's also pretty simple to
>>> use. You just feed it audio and tell it where the file boundaries are.
>>> Cheers,
>>> 	Jean-Marc
>>> On 11/13/2017 04:16 PM, Andreas Stöckel wrote:
>>>> Hi Jean-Mark,
>>>> thank you for your answer!
>>>> Yes, you understood my question correctly. I was just about to compose
>>>> a reply to my original question, where I described how I solved my
>>>> problem. As you've already suggested, I've switched to Ogg/Opus, which
>>>> is better supported, but does not work with the Media Source Extensions.
>>>> I'll have a look whether disabling prediction will help with the
>>>> transitioning phase, but I think the way I'm implementing it right now
>>>> it probably won't.
>>>> So here is what I was going to write originally:
>>>> When I wrote the question, I wasn't really aware of the pre-skip
>>>> (CodecDelay in WebM) and DiscardPadding [1]. However, these properties
>>>> can only be set on a per-stream basis, and not on independent
>>>> sequences of WebM packets. As a consequence of my ignorance regarding
>>>> pre-skip, I also didn't append an additional frame to the audio such
>>>> that 6.5ms lost due to the pre-skip couldn't be recovered when
>>>> decoding. As an additional complication with WebM, there is also no
>>>> way to indicate in a WebM stream that the decoder should reset. So if
>>>> anything, we can only concatenate entire files/streams, and not on a
>>>> per-packet basis.
>>>> However, playing back individual WebM streams with CodecDelay and
>>>> DiscardPadding set (and an additional lead-out frame) did not work,
>>>> since CodecDelay/DiscardPadding were only insufficiently interpreted
>>>> by Chromium/Firefox and even ffmpeg. There is a method for gapless
>>>> concatenation of entire files using MSE, described here [2], but this
>>>> didn't work for Firefox and still produced audible artifacts on Chrome.
>>>> Well, the way I'm solving the problem now is the following:
>>>> First, I've switched to Ogg/Opus. Second, I'm appending a reversed
>>>> version of the first/last 20ms to the beginning/end of the audio chunk
>>>> I'm encoding. This reduces ringing artifacts from the transient at the
>>>> beginning/end of the chunk. I then set pre-skip and the granule of the
>>>> last packet in the generated Ogg stream in such a way, that the
>>>> relevant audio information is "cut out". In contrast to WebM, browsers
>>>> (and ffmpeg) actually correctly interpret this meta-information in an
>>>> Ogg container. However, browsers do not support Ogg in conjunction
>>>> with the Media Source Extensions. Thus, I've ditched MSE and I am now
>>>> decoding the individual chunks with the WebAudio API and schedule
>>>> gapless playback of the chunks (which is not optimal, since WebAudio
>>>> is rather finicky).
>>>> The working implementation can be found here [3]. Since Ogg is so much
>>>> simpler than WebM I also wrote my own minimal C++ Ogg/Opus muxer,
>>>> which shaves off another dependency of my application.
>>>> Thank you for your help,
>>>> Andreas
>>>> [1] https://wiki.xiph.org/MatroskaOpus
>>>> [2]
>>>> https://developers.google.com/web/fundamentals/media/mse/seamless-playback
>>>> [3] https://github.com/astoeckel/opus_gapless
>>>> On 2017-11-13 03:42 PM, Jean-Marc Valin wrote:
>>>>> Hi Andreas,
>>>>> So if I understand your question correctly, what you want is really
>>>>> short "files" that are independent, but yet create a glitchless stream
>>>>> when concatenated, right. For Ogg, this can be implemented with
>>>>> libopusenc and chaining. It works pretty well (even for really tiny
>>>>> files). For WebM, I'm not sure how to handle the details at the
>>>>> container level, but for how to handle the transition details (reset and
>>>>> all), I suggest you have a look at the libopusenc code. In general, the
>>>>> idea is to disable the prediction at the point of the transition between
>>>>> two files and to include the transition frames in both files.
>>>>> Cheers,
>>>>> 	Jean-Marc
>>>>> On 11/08/2017 03:43 AM, Andreas Stöckel wrote:
>>>>>> Hi!
>>>>>> Short version of my question: How to produce Opus frames which can be
>>>>>> safely concatenated and how to embed them into a WebM file?
>>>>>> Long version:
>>>>>> I'm currently implementing a web-based audio player which streams
>>>>>> audio as opus/WebM using the HTML5 media source extensions. Currently,
>>>>>> the server decodes a set of input files to a fixed RAW audio format
>>>>>> (stereo, 48000 kHz) and encodes the resulting continuous RAW stream as
>>>>>> Opus/WebM. Having a single, uninterrupted RAW stream allows for
>>>>>> perfect gapless playback on the client (which only sees a single live
>>>>>> WebM stream), e.g. there are no interruptions whatsoever when
>>>>>> transitioning between continuous tracks from the same music album.
>>>>>> An early tech-demo of the technique can be found here [1], the source
>>>>>> file http_audio_server/encoder.cpp implements the relevant
>>>>>> opus-encoding and webm-encapsulation (but see also [2] for a condensed
>>>>>> version).
>>>>>> Now, for performance reasons I'd like to split my RAW audio into
>>>>>> independent blocks (say, as an example, 50 frames or 1s each), encode
>>>>>> these as raw Opus frames and cache them on disc ahead of time. For
>>>>>> each block I'd like to reset the encoder to ensure independence
>>>>>> between the first frame of each block and the last frames in the
>>>>>> previous block, e.g., using
>>>>>> opus_encoder_ctl(enc_ctx, OPUS_RESET_STATE)
>>>>>> When the client requests a certain sequence of blocks (which may
>>>>>> originate from various input files in (let's pretend) any order) my
>>>>>> goal is to (on-demand) encapsulate the pre-encoded frames as WebM and
>>>>>> send them to the client.
>>>>>> However, in early experiments [2], resetting the encoder state at the
>>>>>> beginning of each block and then concatenating the frames in the WebM
>>>>>> container leads to clearly audible gaps in the decoded WebM stream
>>>>>> whenever the opus encoder has been reset.
>>>>>> Interestingly, such artifacts are far less pronounced (if they exist
>>>>>> at all), if I don't explicitly reset the encoder. However, in my real
>>>>>> application the encoder will at least be reset implicitly (e.g. by
>>>>>> starting the encoding process in multiple threads for two files which
>>>>>> may be played consecutively).
>>>>>> See [2] for a MWE which expresses what I've tried to describe above.
>>>>>> So to rephrase my question: if it is possible at all, how can I
>>>>>> independently pre-encode blocks of Opus audio frames, such that I can
>>>>>> concatenate them during WebM muxing without audible glitches?
>>>>>> In advance, thank you for your help. Please let me know I anything I
>>>>>> wrote is unclear, or you need more information to answer my question.
>>>>>> Andreas
>>>>>> [1] https://github.com/astoeckel/http_audio_server/
>>>>>> [2] https://github.com/astoeckel/opus_gapless_webm/
>>>>>> _______________________________________________
>>>>>> opus mailing list
>>>>>> opus at xiph.org
>>>>>> http://lists.xiph.org/mailman/listinfo/opus
>> _______________________________________________
>> opus mailing list
>> opus at xiph.org
>> http://lists.xiph.org/mailman/listinfo/opus

More information about the opus mailing list