[vorbis-dev] Look what I found under the Xmas tree!
Segher Boessenkool
segher at wanadoo.nl
Sun Dec 24 08:14:42 PST 2000
> Segher,
>
> In future, it'd be much better if you split patches up into seperate parts - optimisations, and other changes. Optimisations can (usually) be applied as-is. Your other changes probably won't be, since they make pretty major functional changes sometimes.
Hi Michael,
You're right. But most of it is interdependent, and the rest is
easily chopped out. But next time, I'll send a bunch of patches.
More work for me, but less work for you. And it makes me review
the changes once more.
> Also, could you please _explain_ functional changes? I haven't looked at this one properly yet, but the previous one made huge changes to the psychoacoustics and short block triggering with ZERO explanation. As a result, it's been ignored (your other patch was applied completely, I think, since it didn't make functional changes, just speed ones).
I did explain it. The rationale was the following: as the psycho
acoustic stuff is mostly black art, and not very exact per se, if
a big speed optimization exists, that changes the psycho by only a
few percent, it is probably worth it. That should
be verified by listening experiments, of course; as far as I can see
(hear), nothing was lost.
I think other people should try the patches, and if they agree, then
the patches can be applied to the main tree. Psycho acoustic changes
are always a bit "experimental", but all of it is.
As for what the patch actually _does_, technically speaking, just
look at the source. It's quite easy.
As for this patches changes, I think I explained it. Only the change
to the floor handling I didn't explain; I refered to a previous mail
where I said I would write a TeX file about it. I'll try to explain
it (inexactly, not explaining the sqrt()): after this change, the
(Euclidean or Manhattan) distance between different residues (i.e.,
a residue and its quantized representation) is something that _means_
something psychoacoustically; it makes sure that if you say "well, the
maximum quantization error is allowed to be X" the actual values of
the residue are of no consequence; the only important thing is that it
is allowed to be at most X. Let's call it "equal error distance" or
something. It is hard to explain in words, and the equations are not
very fit for plain text.
> >What's inside:
> >
> >Request for help! Look in os.h if you're using a compiler or
> >processor I don't use (I use gcc on K5, K7, G3).
>
> What sort of performance increases do you see from this (when you actually use it)? Why 32-byte alignment - the reasons for having aligned allocations is obvious, but is 32 bytes actually beneficial (does this give you cache line alignment or something?)
It helps some routines by as much as ~50%. Difficult to measure,
because you only see it if the data is not already in the L1 cache.
Optimizing routine B puts more pressure at routine A, and suddenly
you see routine A will benefit from it. It only benefits tight
data-driven loops (when you fill the alloca'd array in one go).
32-byte helps a lot on K7. The K7 _really_ likes cache-line aligned
data (first load from the start of a line is much cheaper than from
somewhere inside a line). Shown by profiling. I don't know about
Intel chips, and PPC is mostly forgiving about misalignments
(cache-line misalignments, that is). 16 bytes hurts on K7. Maybe
other processors will need 64 or higher; that's why I asked. It doesn't
hurt processors which don't need it.
>
> >
> >New MDCT! Now we have two; competition is a good thing. Let's
> >see which one is fastest :-)
>
> How much faster is your new one for the case it handles?
Depends. From 20% to 200%, depending on architecture. Both my
implementation and Monty's can be (much) further optimized, so
it's anyones bet which one will be the best in the end.
Monty's is more symmetric, mine is better on the cache. I need
to go look at the recent changes in Monty's MDCT.
> >Changed the ATH_Bark_dB array. This removes some very annoying
> >artifacts when encoding low-frequency, tonal sounds.
>
> This seems like a dubious change. I suspect it's just hiding an actual problem elsewhere. This is what I mean - explain WHY you've done things like this, in reasonable detail, rather than just saying "I think it makes it better sometimes".
I already explained this some weeks ago, and nobody came up with
anything better. BTW., I don't think it's a dubious change. As a
1024 line MDCT is too small to correctly analyze the lower frequencies,
they should be treated more carefully. Also, these tend to be "spikes",
so if the LSP has low resolution in the low freq region, the smaller
of these values will be quantized badly (as they were diminished more
than the actual (non-lsp'ed) floor would have wanted; this leads to
phase distortion and harmonics dropout. This is very detectable.
Another solution would be to use the floor/res stuff only for the noise,
and use a separate encoding for the tonal stuff.
>
> Michael
Thanks for the comments!
I'll resend the patch, split up a bit, with comments added in the source,
to give some explanations of what is going on. Please tell me if I have
to do that to d.o.n and d.n.m as well.
Cheers,
Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list