[xiph-rtp] Theora RTP payload format

Mon Apr 18 09:47:50 PDT 2005

Ralph Giles wrote:

>On Mon, Apr 18, 2005 at 11:30:26AM -0400, Steve Kann wrote:
>
>  
>
>>   I've also read the archives of this list, about some of the proposed 
>>changes.   I'd like to describe here what I'm planning on doing, and see 
>>how this might fit into your design.
>>    
>>
>
>Yay feedback! :-)
>
>  
>
>>   *Note:  I also suspect, but I haven't researched, that if all the 
>>clients are using the same version of the theora encoder, and the same 
>>settings, that their setup headers would likely be the same;  If this is 
>>the case, then their CRC32's would be the same, and they could start 
>>decoding at any keyframe..
>>    
>>
>
>This is correct. Future encoders may well adapt to their input, so your 
>conference engine should just check if the headers are the same. Note 
>that you can also do things like reencode the stream up to the next 
>keyframe if you want to switch someone outside a keyframe boundary, but 
>that reduces quality and uses a lot more resources than packet 
>switching.
>  
>
Yes. I was thinking that a good compromise might be to cache the last 
keyframe from each participant, and then, if we want to switch 
in-between keyframes, I can send the previous keyframe from a 
participant out, then send nothing until the next keyframe comes, and 
then send everything.

In that case, viewers would see the switch instantly, but motion 
wouldn't begin until the next keyframe appeared (which would be in a 
second or two, at most).

>>With the latest idea I've read, though, it makes this process much more 
>>inconvenient, because _each_ client would have their own 16bit "chain 
>>ID", and these chain ID's would be duplicated in the streams sent by 
>>each client, and therefore the server would need to deeply understand 
>>and parse each of the streams in order to put them together, etc.
>>    
>>
>
>Aaron addressed this, but to clarify: the idea with the chain id is that 
>its a simple mark on each packet telling the client what decoder setup 
>to use to decode it. Making the mapping between the chain id and the 
>decoder is an out-of-band process. If you're not using SDP, it's your 
>protocol's responsiblity to set the mapping you want.
>  
>
But, even using SDP, this is pretty inconvienent, because you 
essentially need to have all your clients re-fetch the SDP each time 
someone joins the session (or, if encoders change, any time an encoder 
wants to use a new codebook). For SIP, for example, that will be pretty 
disruptive, needing to do REINVITE and all..

>>It would be most convenient, if there were a "fixed setup" mode for 
>>theora, where you could ask the theora-encoder to use fixed setup header 
>>set, and have it act like other codecs in this respect.  I understand 
>>the flexibility that the setup headers give you in encoder design, but 
>>it would be nice if there were a way to configure it otherwise..
>>    
>>
>
>Well, if we did this officially, it would limit future encoder 
>improvements. Better, we think, to leave such profiles up to 
>particular applications.
>  
>
It wouldn't necessarily need to do that. You could define a set of 
standard codebooks (maybe even just one), and then offer a greatly 
compressed way of signifying that you're going to use this standard 
codebook. Decoders would be required to accept either these "standard" 
codebooks, or "dynamic" codebooks like they do now.

When transmitting the codebooks, you could have a small sequence at the 
beginning saying "standard codebook N", or "dynamic codebook", so you 
could transmit the "standard codebook N" stuff with just a few bytes, 
instead of the 2 kilobytes or so it seems like they take now.

Then, the encoder has the choice to use a fixed codebook or a dynamic 
codebook, and the only limitation on future improvements would be that 
you can't introduce additional "standard codebooks" without introducing 
compatibility problems.

>That said, the fixed config used by the VP3 codec theora is based on
>is one reasonable baseline.
>  
>
Is there any way to force that mode now?

I've basically just gotten to the point where I've got encoding and 
decoding working in my end-point (after figuring out that in order to do 
this in real-time, even for 320x240x15fps I need to set quick_p=1 and 
noise_sensitivity=0), and I haven't yet really dug into the Theora 
codebase yet (other than to figure out what those parameters do) -- I 
just saw this conversation happening, and figured I'd offer my use-case 
out there.

-SteveK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/xiph-rtp/attachments/20050418/70eb4fbd/attachment.htm