[theora-dev] Beginner Hurdles

Sun Aug 15 13:30:56 PDT 2010

Hey everyone, I just got Theora running on my Mac, and ran across  
several hurdles, that I was wondering if someone could help me with.   
I do a lot of tech support at work, and get the same questions over  
and over, so I tried skimming the archives but couldn't find the  
answers.  Maybe these could go in a FAQ of some sort?  These are  
fairly unavoidable issues that should probably be better documented in  
the example at http://svn.xiph.org/trunk/theora/examples/player_example.c 
  since I think most people are looking for something a little higher  
level.  Here they are:

1. Many of us just want to render each frame to RGB. I decided to skip  
SDL for now and render the RGB in software so that I understand how it  
works.  I will probably make a shader to render the YUV directly at  
some point, but speed is not the issue for me right now.

I'm rendering a 4:2:0 movie by hand right now, stretching the UV to  
double size like the Y.  If you follow the example on the line that  
says "} else if (px_fmt==TH_PF_444){" to convert YUV to RGB, you will  
end up with reds that are salmon colored, and the blues are too navy  
colored.  I've posted an example here:

http://www.postimage.org/image.php?v=TsBWFfr

The top 3 frames are variations where I'm trying to align to UV  
channels, but the important thing is that VLC does a very good job  
matching colors, and even though MPlayer falls down on the rendering,  
they both do much better than my code.  I'm guessing that the example  
code suffers from some problem with gamma on the Mac, so I'm wondering  
if someone has the proper coefficients to render:

r = (1904000*(*py)+2609823*(*pv)-363703744)/1635200;
g = (3827562*(*py)-1287801*(*pu)-2672387*(*pv)+447306710)/3287200;
b = (952000*(*py)+1649289*(*pu)-225932192)/817600;

Or if they have the conversion that will stretch this RGB to the right  
color space for the Mac.  I'm mainly concerned with the red and blue  
channels.  it's ok if the linearity is incorrect, because I'm mostly  
concerned with the max and min values.

If nobody has a solution, then I will just sample my colors and come  
up with the factor to scale them to VLC, but this makes me a little  
uncomfortable.

This could all be solved by including a th_decode_rgb_out() function.   
I realize this would be a convenience function, but if there's ever  
been a clearer need for convenience, I certainly can't think of one!

2. There is no example code to linearly interpolate the UV up to the  
Y.  I found on page 29 of the Theora Specification at http://theora.org/doc/Theora.pdf 
  that UV are centered on the 2x2 blocks of the Y.  I can come up with  
something that matches my conception of what's happening, but I worry  
that while videos are being encoded, that there is a standard for  
pixel centered or corner centered, which might end up with 1/2 pixel  
alignment errors.  I think that player_example.c should have a code  
snippet that shows how to properly align the channels.

3. player_example.c doesn't describe how to rewind the various state  
variables back to the beginning of the movie.  Seeking is probably  
overkill, but it's important to be able to rewind so that games can  
use movies that loop.

4. Timing is not well documented.  It took me a while to realize that  
th_granule_time() actually returns the granule time PLUS the duration  
of the current frame, returning the stale time, which is documented at http://www.theora.org/doc/libtheora-1.0/group__basefuncs.html#g707e1e281de788af0df39ef00f3fb432 
  but also needs a comment on the line that says  
"videobuf_time=th_granule_time(td,videobuf_granulepos);".  Also, there  
is no documentation that I can find that describes how Vorbis does  
timing, so I came up with the following code, where I calculate my own  
timing based on the current byte position in the audio stream:

*audioStartTime = audiobuf_time;	// return the buffer time from the  
last call, which was the start of the buffer
audiobuf_time = ((double) audiobuf_granulepos)/vi.rate;
audiobuf_granulepos += (*numBytes/2)/2;	// a granule is a left/right  
stereo pair, so the [size in bytes divided by 16 bits (2 bytes)  
divided by stereo (2 chans)]
*audioStaleTime = audiobuf_time;

As far as I can tell, this results in proper syncing but I could be  
off a frame and not know it...

5. player_example.c is much too complicated.  Now that my code is  
finished, this is my draw loop.  Under the hood, I'm using an RGBA  
texture in OpenGL and a buffered sound channel class a made on top of  
OpenAL.  I believe the code largely speaks for itself, but included  
comments on the confusing parts:

////////////////////////////////////////

Video			video;		// my wrapper class for Ogg Theora
ImageBuffer		videoBuffer;	// my wrapper class for OpenGL texture
BufferSoundChannel	bufChan;	// my wrapper class for OpenAL

if( video.Load( "~/Desktop/ogg-theora-tests/320x240.ogg" ) != noErr )  
ErrorDialog( "Couldn't load file" );

double		startTime = video.HasAudio() ? 0 : bufChan.GetTime();

double		audioStartTime = 0, audioStaleTime = 0,
			videoStartTime = 0, videoStaleTime = 0;
int			audioStatus = kVideoAudioBuffering, videoStatus =  
kVideoVideoBuffering;
Boolean		audioPlaying = false;

while( !KeyDown( escape_key ) )
{
	short	*buffer;
	uint	size;
	double	theTime;

	//if( video.Done() ) video.Reset();	// I don't have rewind working yet

	video.Idle();	// buffer data even if no audio or video frames to  
display

	theTime = bufChan.GetTime() - startTime;

	if( audioStatus < 0 && bufChan.IsReady( (void**) &buffer,  
&size ) )	// see if the channel is ready to accept more audio, and if  
so, get the next temporary buffer in which to write the samples
		audioStatus = video.GetAudio( buffer, &size, &audioStartTime,  
&audioStaleTime );	// returns the frame number of the audio, or a  
negative value to indicate wrong state or an error

	if( audioStatus >= 0 )
	{
		theTime = bufChan.GetTime() - startTime;

		if( theTime >= audioStartTime )
		{
			if( !startTime ) startTime = bufChan.GetTime();	// start the sync  
at the exact moment we play the first buffer

			audioPlaying = true;

			bufChan.Play( buffer, size );	// queue the next audio buffer

			audioStatus = kVideoAudioBuffering;	// indicate that we are ready  
to read the next audio frame
		}
	}

	theTime = bufChan.GetTime() - startTime;

	if( videoStatus < 0 && (audioPlaying || !video.HasAudio()) )	// wait  
for the first audio buffer to play, because video can play faster to  
catch up, but audio is locked into the sample rate
		videoStatus = video.GetVideo( videoBuffer, &viewRect,  
&videoStartTime, &videoStaleTime );	// returns the frame number of the  
video, or a negative value to indicate wrong state or an error

	if( videoStatus >= 0 )
	{
		theTime = bufChan.GetTime() - startTime;

		if( theTime >= videoStartTime )
		{
			FlushToScreen();	// can just flush the texture buffer to the  
screen, because the video frame was written to videoBuffer when the  
frame was ready

			videoStatus = kVideoVideoBuffering;	// indicate that we are ready  
to read the next video frame
		}
	}
}

////////////////////////////////////////

Conceptually, the important parts are that I can grab the next audio  
or frame whenever I need them, and then I just wait until my buffered  
sound channel's timestamp reaches the audio and video buffers' start  
times.  Then I either queue the audio buffer samples or flush the  
video buffer to the screen.  I'm not calibrating my audio clock, but  
that's ok for the short duration videos we plan to use for now.

IMHO, this is the example that people are looking for, and I don't see  
Theora gaining popularity in games as quickly if there is no high- 
level usage example like this.  I don't mean to sound harsh, it's just  
that I have a 4 year engineering degree and 20 years of programming  
experience and still found player_example.c to be quite cryptic.   
Conceptually it's very well written, and I appreciate things like  
select on the audio stream, but unfortunately with Mac and Windows  
operating systems, a lot of programmers today don't understand the  
simplicity of streams or even how to access them through the OS.

Sometimes we take our experience for granted and expect others to  
either know what we are talking about or to read the entire manual and  
understand the fundamentals before using a library.  But in the real  
world, it's almost impossible for that to happen when there are  
divisions of labor and constant deadlines.

If I can get the RGB gamma stuff working properly, I might consider  
releasing this code as an alternate player example that uses OpenGL  
and OpenAL.  It might take me a while, because my OpenGL and OpenAL  
stuff is tied pretty solidly into our engine, but maybe I could rig a  
barebones example with GLUT.  But I would like to see it happen :)

Well I guess that's it, sorry for the long email, but I think these  
are issues that everyone is going to hit, so they need to be remedied.

Thanxs for writing Ogg Theora, I think it's a wonderful library, just  
needs some better documentation and examples.  I tend to work on the  
back end of things and realize that getting someone to write concise  
example code is like herding cats.

Zack Morris
zmorris at zsculpt.com