[theora-dev] Beginner Hurdles
Zack Morris
zmorris at gmail.com
Sun Aug 15 13:30:56 PDT 2010
Hey everyone, I just got Theora running on my Mac, and ran across
several hurdles, that I was wondering if someone could help me with.
I do a lot of tech support at work, and get the same questions over
and over, so I tried skimming the archives but couldn't find the
answers. Maybe these could go in a FAQ of some sort? These are
fairly unavoidable issues that should probably be better documented in
the example at http://svn.xiph.org/trunk/theora/examples/player_example.c
since I think most people are looking for something a little higher
level. Here they are:
1. Many of us just want to render each frame to RGB. I decided to skip
SDL for now and render the RGB in software so that I understand how it
works. I will probably make a shader to render the YUV directly at
some point, but speed is not the issue for me right now.
I'm rendering a 4:2:0 movie by hand right now, stretching the UV to
double size like the Y. If you follow the example on the line that
says "} else if (px_fmt==TH_PF_444){" to convert YUV to RGB, you will
end up with reds that are salmon colored, and the blues are too navy
colored. I've posted an example here:
http://www.postimage.org/image.php?v=TsBWFfr
The top 3 frames are variations where I'm trying to align to UV
channels, but the important thing is that VLC does a very good job
matching colors, and even though MPlayer falls down on the rendering,
they both do much better than my code. I'm guessing that the example
code suffers from some problem with gamma on the Mac, so I'm wondering
if someone has the proper coefficients to render:
r = (1904000*(*py)+2609823*(*pv)-363703744)/1635200;
g = (3827562*(*py)-1287801*(*pu)-2672387*(*pv)+447306710)/3287200;
b = (952000*(*py)+1649289*(*pu)-225932192)/817600;
Or if they have the conversion that will stretch this RGB to the right
color space for the Mac. I'm mainly concerned with the red and blue
channels. it's ok if the linearity is incorrect, because I'm mostly
concerned with the max and min values.
If nobody has a solution, then I will just sample my colors and come
up with the factor to scale them to VLC, but this makes me a little
uncomfortable.
This could all be solved by including a th_decode_rgb_out() function.
I realize this would be a convenience function, but if there's ever
been a clearer need for convenience, I certainly can't think of one!
2. There is no example code to linearly interpolate the UV up to the
Y. I found on page 29 of the Theora Specification at http://theora.org/doc/Theora.pdf
that UV are centered on the 2x2 blocks of the Y. I can come up with
something that matches my conception of what's happening, but I worry
that while videos are being encoded, that there is a standard for
pixel centered or corner centered, which might end up with 1/2 pixel
alignment errors. I think that player_example.c should have a code
snippet that shows how to properly align the channels.
3. player_example.c doesn't describe how to rewind the various state
variables back to the beginning of the movie. Seeking is probably
overkill, but it's important to be able to rewind so that games can
use movies that loop.
4. Timing is not well documented. It took me a while to realize that
th_granule_time() actually returns the granule time PLUS the duration
of the current frame, returning the stale time, which is documented at http://www.theora.org/doc/libtheora-1.0/group__basefuncs.html#g707e1e281de788af0df39ef00f3fb432
but also needs a comment on the line that says
"videobuf_time=th_granule_time(td,videobuf_granulepos);". Also, there
is no documentation that I can find that describes how Vorbis does
timing, so I came up with the following code, where I calculate my own
timing based on the current byte position in the audio stream:
*audioStartTime = audiobuf_time; // return the buffer time from the
last call, which was the start of the buffer
audiobuf_time = ((double) audiobuf_granulepos)/vi.rate;
audiobuf_granulepos += (*numBytes/2)/2; // a granule is a left/right
stereo pair, so the [size in bytes divided by 16 bits (2 bytes)
divided by stereo (2 chans)]
*audioStaleTime = audiobuf_time;
As far as I can tell, this results in proper syncing but I could be
off a frame and not know it...
5. player_example.c is much too complicated. Now that my code is
finished, this is my draw loop. Under the hood, I'm using an RGBA
texture in OpenGL and a buffered sound channel class a made on top of
OpenAL. I believe the code largely speaks for itself, but included
comments on the confusing parts:
////////////////////////////////////////
Video video; // my wrapper class for Ogg Theora
ImageBuffer videoBuffer; // my wrapper class for OpenGL texture
BufferSoundChannel bufChan; // my wrapper class for OpenAL
if( video.Load( "~/Desktop/ogg-theora-tests/320x240.ogg" ) != noErr )
ErrorDialog( "Couldn't load file" );
double startTime = video.HasAudio() ? 0 : bufChan.GetTime();
double audioStartTime = 0, audioStaleTime = 0,
videoStartTime = 0, videoStaleTime = 0;
int audioStatus = kVideoAudioBuffering, videoStatus =
kVideoVideoBuffering;
Boolean audioPlaying = false;
while( !KeyDown( escape_key ) )
{
short *buffer;
uint size;
double theTime;
//if( video.Done() ) video.Reset(); // I don't have rewind working yet
video.Idle(); // buffer data even if no audio or video frames to
display
theTime = bufChan.GetTime() - startTime;
if( audioStatus < 0 && bufChan.IsReady( (void**) &buffer,
&size ) ) // see if the channel is ready to accept more audio, and if
so, get the next temporary buffer in which to write the samples
audioStatus = video.GetAudio( buffer, &size, &audioStartTime,
&audioStaleTime ); // returns the frame number of the audio, or a
negative value to indicate wrong state or an error
if( audioStatus >= 0 )
{
theTime = bufChan.GetTime() - startTime;
if( theTime >= audioStartTime )
{
if( !startTime ) startTime = bufChan.GetTime(); // start the sync
at the exact moment we play the first buffer
audioPlaying = true;
bufChan.Play( buffer, size ); // queue the next audio buffer
audioStatus = kVideoAudioBuffering; // indicate that we are ready
to read the next audio frame
}
}
theTime = bufChan.GetTime() - startTime;
if( videoStatus < 0 && (audioPlaying || !video.HasAudio()) ) // wait
for the first audio buffer to play, because video can play faster to
catch up, but audio is locked into the sample rate
videoStatus = video.GetVideo( videoBuffer, &viewRect,
&videoStartTime, &videoStaleTime ); // returns the frame number of the
video, or a negative value to indicate wrong state or an error
if( videoStatus >= 0 )
{
theTime = bufChan.GetTime() - startTime;
if( theTime >= videoStartTime )
{
FlushToScreen(); // can just flush the texture buffer to the
screen, because the video frame was written to videoBuffer when the
frame was ready
videoStatus = kVideoVideoBuffering; // indicate that we are ready
to read the next video frame
}
}
}
////////////////////////////////////////
Conceptually, the important parts are that I can grab the next audio
or frame whenever I need them, and then I just wait until my buffered
sound channel's timestamp reaches the audio and video buffers' start
times. Then I either queue the audio buffer samples or flush the
video buffer to the screen. I'm not calibrating my audio clock, but
that's ok for the short duration videos we plan to use for now.
IMHO, this is the example that people are looking for, and I don't see
Theora gaining popularity in games as quickly if there is no high-
level usage example like this. I don't mean to sound harsh, it's just
that I have a 4 year engineering degree and 20 years of programming
experience and still found player_example.c to be quite cryptic.
Conceptually it's very well written, and I appreciate things like
select on the audio stream, but unfortunately with Mac and Windows
operating systems, a lot of programmers today don't understand the
simplicity of streams or even how to access them through the OS.
Sometimes we take our experience for granted and expect others to
either know what we are talking about or to read the entire manual and
understand the fundamentals before using a library. But in the real
world, it's almost impossible for that to happen when there are
divisions of labor and constant deadlines.
If I can get the RGB gamma stuff working properly, I might consider
releasing this code as an alternate player example that uses OpenGL
and OpenAL. It might take me a while, because my OpenGL and OpenAL
stuff is tied pretty solidly into our engine, but maybe I could rig a
barebones example with GLUT. But I would like to see it happen :)
Well I guess that's it, sorry for the long email, but I think these
are issues that everyone is going to hit, so they need to be remedied.
Thanxs for writing Ogg Theora, I think it's a wonderful library, just
needs some better documentation and examples. I tend to work on the
back end of things and realize that getting someone to write concise
example code is like herding cats.
Zack Morris
zmorris at zsculpt.com
More information about the theora-dev
mailing list