[Theora-dev] Questions about efficiency.

Wed Jun 22 01:31:06 PDT 2005

Good afternoon!

Thanks for the previous answers, now there was a question of efficiency. I use the formula of transformation YUV12->RGB:

        float r = nY + 1.371f * ( nV - 128 );
        float g = nY - 0.698f * ( nV - 128 ) - 0.336f * ( nU - 128 );
        float b = nY + 1.732f * ( nU - 128 );

        frame[index + 0] = ClampFloatToByte( r );
        frame[index + 1] = ClampFloatToByte( g );
        frame[index + 2] = ClampFloatToByte( b );
        frame[index + 3]= 255;

But an execution time of this transformation on my computer of the order of 10-15 milliseconds - very long. 
After I have translated all calculations in integer area such code has turned out: 

        register signed short r = (signed short)nY + (signed short)( ( (unsigned char)175 * nV ) >> 7);
        register signed short g = (signed short)nY - (signed short)( ( (unsigned char)89 * nV + (unsigned char)43 * nU ) >> 7);
        register signed short b = (signed short)nY + (signed short)( ( (unsigned char)222 * nU ) >> 7);

        frame[index + 0] = ClampShortToByte(r);
        frame[index + 1] = ClampShortToByte(g);
        frame[index + 2] = ClampShortToByte(b);
        frame[index + 3] = 255;

This code is carried out the order of 5-7 milliseconds that is much better, but all the same is insufficiently fast.
Optimization I see the further in application MMX because data is integer and it is necessary to clamp. However, it
seems to me, I " invent a wheel " and I go that by which already have passed many people up to me.
Whether prompt, please, there are accessible examples of as much as possible effective performance of
transformation YUV12->RGB and where it is possible to read about it.
Thanks!

P.S. My video card doesn't support pixel shaders, so I cannot use it to complete this task.