[opus] Potential transient pre-echo reduction filter
Aikku
aik at aol.com.au
Sat Feb 16 11:56:42 UTC 2019
Hey everyone.
I've been designing my own audio codec with extremely strict
decode-performance constraints (including a fixed block size), which led
me to attempting a number of unorthodox things to squeeze as much
quality as possible.
One surprising thing I discovered just earlier today was an extremely
cheap method of reducing pre-echo during transients, without using short
blocks (and still using a 50% overlap MDCT). Since I figured this might
be pretty important even /with/ them (due to better frequency
resolution), I decided to send a message here.
The basic idea is: Transients generally add a sinusoidal shape to the
frequency-domain coefficients, which is what makes them so hard to code
at low bitrates, and why some codecs even implement frequency-domain
linear prediction. But since that would impact performance a fair bit
for my codec, I instead decided to take a lesson from wavelets and used
a simple sum/difference on every pair of coefficients (sort-of emulating
a Haar wavelet). ie. (excluding normalization factors)
a = X[i*2], b = X[i*2+1]
X[i*2] = a + b
X[i*2+1] = a - b
I tested with a very long block size to really emphasize the transient
error diffusion, and the pre-echo seems to be cut down by half with no
other effort at all. Adding another layer of this transform (ie.
X[i*4]+/-X[i*4+2]) doesn't seem to improve things much, though. Perhaps
a more frequency-selective wavelet (eg. LeGall5/3 or CDF9/7) would
improve things a bit more, but in honesty I don't really have the
motivation to test any further than this.
I've included some examples of this modification in action (please
excuse the particular taste in music; I just find it very useful for
this kind of testing). It's coded at 32kbps stereo @ 44.1kHz, though
quality is heavily degraded from using a larger block size than what
I've optimized it for (and degrades further with the 2x filter, due to
even worse separation between the non-zero and zero bands making the
run-length coding perform very badly).
I'm not sure how useful this will be, given the CELT layer's band
folding, but I hope it's useful for the community at large anyway.
-- Ruben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 32kbps_2xFilter.flac
Type: audio/flac
Size: 1190399 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0003.flac>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 32kbps_NoFilter.flac
Type: audio/flac
Size: 1181118 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0004.flac>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 32kbps_1xFilter.flac
Type: audio/flac
Size: 1177537 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0005.flac>
More information about the opus
mailing list