[vorbis-dev] fast math optimization patch

Sat Feb 3 21:19:59 PST 2001

   This patch adds two files, fast_math.[ch].  These files includes a 
couple faster versions
of the libm routines that don't provide all the semantics and accuracy 
of the IEEE
versions (but are still pretty darn accurate).  Typically there are 
situations in which
full precision is not necessary and the following patch tries to apply 
this approach.
I don't know whether this patch will produce acceptable results -- 
everything
sounds fine to me, but I haven't done exhaustive testing (and I'm not 
the world's
most critical listener).

   Here is the blow by blow description of the patch:

- Add fast_math.[ch]
     - fast_rsqrt_est  -- fast reciprocal square root estimate (i.e., 
1.0/sqrt(x))
     - fast_sqrt_est -- fast square root estimate
     - fast_log_est -- fast natural log estimate
     - Adds a IEEE structure for floats
     - Adds VORBIS_EXACT_FLOAT_RESULTS.  If this is defined, the 
estimations are turned off.

- Modify os.h
     - Remove _V_IFDEFJAIL_H_.  The entire file is already in an _OS_H 
ifdef,
        so this shouldn't be necessary.
     - Added normalized processor ifdef support (VORBIS_X86 and 
VORBIS_PPC so far).
       These control whether process-specific optimizations are allowed.
       Defining C_ONLY turns off all processor-specific code.
     - Added VORBIS_BIG_ENDIAN and VORBIS_LITTLE_ENDIAN since
       the other various macros for this are not portable
    - Switched a few tests to use VORBIS_BIG/LITTLE_ENDIAN
- Modify scales.h
     - Import fast_math.h and use fast_log_est instead of log() for todB 
and todB_nn()

   The sqrt approximation functions are only applicable on PPC right now 
and this patch does not include the code to actually use them (since I 
wanted to provide some testing numbers that would better reflect likely 
changes in x86 performance).

   The log approximation breaks the float into its IEEE structure.  This 
allows use to represent the values as 2^n*fraction and compute the log 
as:

    log(2^n*fraction)
        = log(2^n) + log(fraction)
        = n*log(2) + log(fraction)

   The log(fraction) estimate is done via few terms of a series expansion 
for log that converges better with smaller values.  The log estimate is 
something like 8.5 times faster on my PPC.  YMMV on x86 (I'd like to 
hear about the performance change this introduces).

   I haven't tested this on x86, but it should work fine (I tested some 
similar code).  Please let me know if this does anything weird.

   On my PPC box, my test case encoded about 8-9% faster with this change 
(the PPC specific sqrt change was not enabled for this test).

   Still needed are:

   - Someone with a better ear than mine needs to test that these don't 
produce artifacts.
   - The fast log estimate could have a term or two removed to increase 
speed
      more if artifacts still aren't found.

   Here are the new files and the patch for existing files.

<Attachment missing><Attachment missing><Attachment missing>

   Please let me know if there are any problems with this patch that I 
can correct.

   Thanks!

-tim

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.