[vorbis-dev] voribs_analysis() question

Jeff Squyres jsquyres at lsc.nd.edu
Fri Dec 8 14:58:12 PST 2000

A variation on questions that I've asked before...

In working on the parallel version of oggenc (both threaded and MPI), a
profiling run shows that the function vorbis_analysis() takes up the
majority of the run time.  This seems to be an obvious choice for
parallelization -- send each vorbis_block to a different processor, and
let them call vorbis_analsis() in parallel with each other.

However, as has been briefly alluded on this list before, there is global
state that is shared between all vorbis_block instances.  Specifically,
vorbis_block contains a pointer to the stream's vorbis_dsp_state, which in
turn, has a pointer to the stream's vorbis_info.

The code under vorbis_analysis gets compilcated quickly (and I'm *not* a
math/DSP kind of guy), but I can see that, at the very least, the global
vorbis_dsp_state seems to be getting modified in each call to

My question is: is there a way to "simulate" the global state?  Given that
I want to split vorbis_blocks across multiple processors, I'll likely also
have to give them their own local vorbis_dsp_state instead of one big
shared one (it *looks* like the underlying vorbis_info is *not* modified
during the encoding process; so keeping the one global vorbis_info should
be ok -- please correct me if I'm wrong).

Is this possible?  Assumedly, having the vorbis_dsp_state maintained from
block to block is there for some specific voodoo in the encoding
algorithm.  Is there a way to "fake" this with multiple copies of the

For example, could I have each processor do 50 blocks, but actually give
each processor *51* blocks (each set of 51 sharing a single dsp_state)?
Ignoring processor 0, because the first 50 blocks will be processed
exactly as they are in serial, it would look like this:

processor 1: blocks 50-100, all sharing dsp_state1
processor 2: blocks 100-150, all sharing dsp_state2
processor 3: blocks 150-200, all sharing dsp_state3

It gets a little more complicated than that, but you get the general idea.

Notice that there's an overlap of one block between adjacent processors.
The idea would be to process the first (overlapping) block to build up
state in that set's corresponding dsp_state, and then throw away the
output from that first block.  Then process the remaining 50 blocks, and
keep their output as normal.  That is, the sole purpose of the overlapping
block is to build up the dsp_state.

Would this work?  I don't know the internals of the encoding algorithm, so
don't hesitate to tell me that I'm way off base.  Or is there a different
and/or better and/or easier way?


As a sidenote question, why is the _P_mapping stuff used in libvorbis
instead of virtual functions on a C++ class?  It seems that you've
effected the same thing, except in C.  Just curious...

{+} Jeff Squyres
{+} squyres at cse.nd.edu
{+} Perpetual Obsessive Notre Dame Student Craving Utter Madness
{+} "I came to ND for 4 years and ended up staying for a decade"

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Vorbis-dev mailing list