[opus] Potential transient pre-echo reduction filter

Sat Feb 16 11:56:42 UTC 2019

Hey everyone.

I've been designing my own audio codec with extremely strict 
decode-performance constraints (including a fixed block size), which led 
me to attempting a number of unorthodox things to squeeze as much 
quality as possible.

One surprising thing I discovered just earlier today was an extremely 
cheap method of reducing pre-echo during transients, without using short 
blocks (and still using a 50% overlap MDCT). Since I figured this might 
be pretty important even /with/ them (due to better frequency 
resolution), I decided to send a message here.

The basic idea is: Transients generally add a sinusoidal shape to the 
frequency-domain coefficients, which is what makes them so hard to code 
at low bitrates, and why some codecs even implement frequency-domain 
linear prediction. But since that would impact performance a fair bit 
for my codec, I instead decided to take a lesson from wavelets and used 
a simple sum/difference on every pair of coefficients (sort-of emulating 
a Haar wavelet). ie. (excluding normalization factors)

a = X[i*2], b = X[i*2+1]
X[i*2] = a + b
X[i*2+1] = a - b

I tested with a very long block size to really emphasize the transient 
error diffusion, and the pre-echo seems to be cut down by half with no 
other effort at all. Adding another layer of this transform (ie. 
X[i*4]+/-X[i*4+2]) doesn't seem to improve things much, though. Perhaps 
a more frequency-selective wavelet (eg. LeGall5/3 or CDF9/7) would 
improve things a bit more, but in honesty I don't really have the 
motivation to test any further than this.

I've included some examples of this modification in action (please 
excuse the particular taste in music; I just find it very useful for 
this kind of testing). It's coded at 32kbps stereo @ 44.1kHz, though 
quality is heavily degraded from using a larger block size than what 
I've optimized it for (and degrades further with the 2x filter, due to 
even worse separation between the non-zero and zero bands making the 
run-length coding perform very badly).

I'm not sure how useful this will be, given the CELT layer's band 
folding, but I hope it's useful for the community at large anyway.

-- Ruben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 32kbps_2xFilter.flac
Type: audio/flac
Size: 1190399 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0003.flac>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 32kbps_NoFilter.flac
Type: audio/flac
Size: 1181118 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0004.flac>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 32kbps_1xFilter.flac
Type: audio/flac
Size: 1177537 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0005.flac>