[opus] Gapless concatenation of Opus frames

Wed Nov 15 08:00:21 UTC 2017

Hi Jean-Marc (and everyone else who replied),

> Considering you're switching to Ogg, I think you should give libopusenc> a try. It does a really good job at getting rid of *all*
discontinuities> -- to the point where you can chop a song into files
less than one> millisecond each and it still sounds good. It's also
pretty simple to> use. You just feed it audio and tell it where the
file boundaries are.

thank you for pointing me at libopusenc. I had a look at the source
code and liked the idea of using Linear Predictive Coding for the
generation of the lead-in/lead-out frame. This avoids some
high-frequency content that my mirroring technique produced. I
C++ified the corresponding 1994 LPC code and implanted it into my
program [1]. Works like a charm.

Since my program seemed to work fairly well I was doing some extended
tests and found one particular case where it still produces audible
artifacts.

Unfortunately, libopusenc with ope_encoder_continue_new_file (see [2]
for my code) produces similar (though not the same) audible artifacts.
The affected audio file has very low frequency content (produced by a
Taiko).

In my program the low frequency content seems to be phase shifted,
producing a discontinuity while transitioning OGG files [3].

Libopusenc seems to introduce ringing artifacts [4], resulting in a
similar, yet not that pronounced clicking noise. (Maybe the ringing
stems from no "lead-in" frame being used is used -- in my program I do
a reverse LPC at the begining of the first audio chunk to create an
artificial frame that leads up to the first frame [7]).

You can reproduce the libopusenc problem by compiling my adapted
opusenc_example.c [2] and feeding in a segment of the affected RAW
audio as indicated at the beginning of my source code. The RAW can be
downloaded here [5] (48000 Hz, stereo, 16-bit signed, little endian;
the complete song can be downloaded here [6]).

Any idea how any of the two issues (either in libopusenc or in my
program) might be solved?

Again, thank you for your help!

Cheers,
Andreas

[1] https://github.com/astoeckel/opus_gapless/blob/master/lpc.cpp

[2] https://gist.github.com/astoeckel/6731bc846a2f70dd7f5e155e75683fae

[3] https://somweyr.de/opus/click_opus_gapless.png

[4] https://somweyr.de/opus/click_libopusenc.png

[5]
https://somweyr.de/opus/test_libopusenc_ope_encoder_continue_new_file.raw.bz2

[6] https://www.youtube.com/watch?v=z64HCi2rQkE

[7]
https://github.com/astoeckel/opus_gapless/blob/master/opus_gapless.cpp#L82

> 
> Cheers,
> 
> 	Jean-Marc
> 
> On 11/13/2017 04:16 PM, Andreas Stöckel wrote:
>> Hi Jean-Mark,
>>
>> thank you for your answer!
>>
>> Yes, you understood my question correctly. I was just about to compose
>> a reply to my original question, where I described how I solved my
>> problem. As you've already suggested, I've switched to Ogg/Opus, which
>> is better supported, but does not work with the Media Source Extensions.
>>
>> I'll have a look whether disabling prediction will help with the
>> transitioning phase, but I think the way I'm implementing it right now
>> it probably won't.
>>
>> So here is what I was going to write originally:
>>
>> When I wrote the question, I wasn't really aware of the pre-skip
>> (CodecDelay in WebM) and DiscardPadding [1]. However, these properties
>> can only be set on a per-stream basis, and not on independent
>> sequences of WebM packets. As a consequence of my ignorance regarding
>> pre-skip, I also didn't append an additional frame to the audio such
>> that 6.5ms lost due to the pre-skip couldn't be recovered when
>> decoding. As an additional complication with WebM, there is also no
>> way to indicate in a WebM stream that the decoder should reset. So if
>> anything, we can only concatenate entire files/streams, and not on a
>> per-packet basis.
>>
>> However, playing back individual WebM streams with CodecDelay and
>> DiscardPadding set (and an additional lead-out frame) did not work,
>> since CodecDelay/DiscardPadding were only insufficiently interpreted
>> by Chromium/Firefox and even ffmpeg. There is a method for gapless
>> concatenation of entire files using MSE, described here [2], but this
>> didn't work for Firefox and still produced audible artifacts on Chrome.
>>
>>
>> Well, the way I'm solving the problem now is the following:
>>
>> First, I've switched to Ogg/Opus. Second, I'm appending a reversed
>> version of the first/last 20ms to the beginning/end of the audio chunk
>> I'm encoding. This reduces ringing artifacts from the transient at the
>> beginning/end of the chunk. I then set pre-skip and the granule of the
>> last packet in the generated Ogg stream in such a way, that the
>> relevant audio information is "cut out". In contrast to WebM, browsers
>> (and ffmpeg) actually correctly interpret this meta-information in an
>> Ogg container. However, browsers do not support Ogg in conjunction
>> with the Media Source Extensions. Thus, I've ditched MSE and I am now
>> decoding the individual chunks with the WebAudio API and schedule
>> gapless playback of the chunks (which is not optimal, since WebAudio
>> is rather finicky).
>>
>> The working implementation can be found here [3]. Since Ogg is so much
>> simpler than WebM I also wrote my own minimal C++ Ogg/Opus muxer,
>> which shaves off another dependency of my application.
>>
>>
>> Thank you for your help,
>> Andreas
>>
>>
>>
>> [1] https://wiki.xiph.org/MatroskaOpus
>>
>> [2]
>> https://developers.google.com/web/fundamentals/media/mse/seamless-playback
>>
>> [3] https://github.com/astoeckel/opus_gapless
>>
>> On 2017-11-13 03:42 PM, Jean-Marc Valin wrote:
>>> Hi Andreas,
>>>
>>> So if I understand your question correctly, what you want is really
>>> short "files" that are independent, but yet create a glitchless stream
>>> when concatenated, right. For Ogg, this can be implemented with
>>> libopusenc and chaining. It works pretty well (even for really tiny
>>> files). For WebM, I'm not sure how to handle the details at the
>>> container level, but for how to handle the transition details (reset and
>>> all), I suggest you have a look at the libopusenc code. In general, the
>>> idea is to disable the prediction at the point of the transition between
>>> two files and to include the transition frames in both files.
>>>
>>> Cheers,
>>>
>>> 	Jean-Marc
>>>
>>> On 11/08/2017 03:43 AM, Andreas Stöckel wrote:
>>>> Hi!
>>>>
>>>> Short version of my question: How to produce Opus frames which can be
>>>> safely concatenated and how to embed them into a WebM file?
>>>>
>>>> Long version:
>>>>
>>>> I'm currently implementing a web-based audio player which streams
>>>> audio as opus/WebM using the HTML5 media source extensions. Currently,
>>>> the server decodes a set of input files to a fixed RAW audio format
>>>> (stereo, 48000 kHz) and encodes the resulting continuous RAW stream as
>>>> Opus/WebM. Having a single, uninterrupted RAW stream allows for
>>>> perfect gapless playback on the client (which only sees a single live
>>>> WebM stream), e.g. there are no interruptions whatsoever when
>>>> transitioning between continuous tracks from the same music album.
>>>>
>>>> An early tech-demo of the technique can be found here [1], the source
>>>> file http_audio_server/encoder.cpp implements the relevant
>>>> opus-encoding and webm-encapsulation (but see also [2] for a condensed
>>>> version).
>>>>
>>>>
>>>> Now, for performance reasons I'd like to split my RAW audio into
>>>> independent blocks (say, as an example, 50 frames or 1s each), encode
>>>> these as raw Opus frames and cache them on disc ahead of time. For
>>>> each block I'd like to reset the encoder to ensure independence
>>>> between the first frame of each block and the last frames in the
>>>> previous block, e.g., using
>>>>
>>>> opus_encoder_ctl(enc_ctx, OPUS_RESET_STATE)
>>>>
>>>> When the client requests a certain sequence of blocks (which may
>>>> originate from various input files in (let's pretend) any order) my
>>>> goal is to (on-demand) encapsulate the pre-encoded frames as WebM and
>>>> send them to the client.
>>>>
>>>> However, in early experiments [2], resetting the encoder state at the
>>>> beginning of each block and then concatenating the frames in the WebM
>>>> container leads to clearly audible gaps in the decoded WebM stream
>>>> whenever the opus encoder has been reset.
>>>>
>>>> Interestingly, such artifacts are far less pronounced (if they exist
>>>> at all), if I don't explicitly reset the encoder. However, in my real
>>>> application the encoder will at least be reset implicitly (e.g. by
>>>> starting the encoding process in multiple threads for two files which
>>>> may be played consecutively).
>>>>
>>>> See [2] for a MWE which expresses what I've tried to describe above.
>>>>
>>>> So to rephrase my question: if it is possible at all, how can I
>>>> independently pre-encode blocks of Opus audio frames, such that I can
>>>> concatenate them during WebM muxing without audible glitches?
>>>>
>>>>
>>>> In advance, thank you for your help. Please let me know I anything I
>>>> wrote is unclear, or you need more information to answer my question.
>>>>
>>>>
>>>> Andreas
>>>>
>>>>
>>>> [1] https://github.com/astoeckel/http_audio_server/
>>>> [2] https://github.com/astoeckel/opus_gapless_webm/
>>>> _______________________________________________
>>>> opus mailing list
>>>> opus at xiph.org
>>>> http://lists.xiph.org/mailman/listinfo/opus
>>>>
>>