[xiph-rtp] Theora RTP payload format

Mon Apr 18 14:52:16 PDT 2005

On Mon, Apr 18, 2005 at 01:50:13PM -0700, Ralph Giles wrote:
> On Mon, Apr 18, 2005 at 03:13:19PM -0400, Steve Kann wrote:
> 
> > In one particular use case, (off-line encoding to .ogg files), all this 
> > isn't much of a headache. But for use-cases like this, and perhaps for 
> > many others, this is quite a headache. For example, If I had all this 
> > working with h.263 (or h.264), and I wanted to switch to theora, it 
> > would be quite a job, because compared to the design of most video 
> > codecs, theora is a square peg when you might have a round hole..
> 
> Yes, this is all about the configuration header which is different from 
> the way way most other codecs are designed.

Just to be clear, the flexibility of the vorbis setup headers have 
served us very well. The irony of that statement is that linux 
distributions are the only significant os vendors shipping our codecs as 
a matter of course. The fact that a beta3 decoder release can play 
files from aoTuVb4 with better quality at half the bitrate is a 
significant acheivement.

So yes, the flexibility means more work at the front end, and yes the 
CRC32-as-ident proposal would have traded the explicit chainid mapping 
table for an implicit one. We've generally found dealing with the setup 
overhead isn't as complex as you're expecting. The idea is that doing a 
little more work up front is easier than having the mass-upgrade your 
installed base in two years.

It's nice when it's easy to get things 'just working' quickly, but it's 
also nice to do things right. You were already talking about negotiating 
a common frame size and rate, and the rtp server mixing the streams 
together, which I understand affects the SSRC and CSRC RTP header 
fields, only switching on keyframes and so on, all of which requires at 
least a little bit of codec knowledge. And theora, at least, is designed 
so things like header and keyframe packet detection can be done easily 
without a full decode. (Just by looking at the first byte for those 
cases.)

Our concern with defining profiles, like the 'VP3' bit I suggested has 
always been encouraging inoperable implementations that only support 
that profile. "profiles are useless" has been a common lesson of many 
specification designs. They make committee decisions easier, but then 
end you either implement the de facto standard or you don't. Those are 
the main reasons I remain unconvinced.

Note also that while the chain id lets you multiplex streams from 
encoders using a different setup, you don't have to do it that way.
Your application might be better served by mandating that everyone
use the same profile and then not worry about chaining at all. That's 
more like the situation you have with fixed setup codecs.

I hope that explains the design reasoning a bit better, and why we've 
been resistent to things like static codebook sets. We do very much 
appreciate your opinion and contribution to the design discussion and 
are very willing to help you figure out what needs to be done to make 
your implementation work well.

 -r