[xiph-rtp] Chaining

David Barrett dbarrett at quinthar.com
Tue Aug 30 11:01:36 PDT 2005


Aaron Colwell wrote:
> You haven't mentioned what codecs your using right now. If you are using
> Theora and/or Speex you don't need to change the codebooks to change the rate.
> All you need to do is reinit the decoder with different parameters. In the
> case of Theora, you can just reinit the current encoder with a different
> bitrate or quality and it will output bits at a different rate.

I'm using Theora and Speex, as you guessed.  But I'm changing all the 
video settings on the fly -- framerate, frame size, encoding quality, 
etc.  (Audio too, but to a lesser degree.)

  If you rely
> on the RTP timestamps instead of what the decoder tells you, then you can also
> change the frame rate without needing to update the codebook. Vorbis may not
> have this sort of flexibility. I'm not an expert, but I believe that this is
> what MikeS said on IRC.

RTP timestamps are another discussion that I haven't followed closely. 
I actually didn't realize there were any restrictions on them (ie, that 
they need to be a multiple of anything).  I just have my decoder accept 
arbitrary timestamps and sync up with the audio, even if the video 
framerate is irregular.  I ignore the decoder timestamps.


> Since you may not be able to rate adapt Vorbis streams we may want to allow
> multiple encodings to be specified at session establishment time. This makes
> things a little more flexible so that we can accomadate your use case, but
> doesn't open the flood gates of all sorts of wacky complexity that full fledged
> chaining support would require. 

I guess I still don't see what this flood gate o' wackiness is that 
chaining opens.  (I'm sorry if this was discussed in detail and I fell 
behind.)  I see that the SDP gets horribly complicated if you need to 
download a bunch of codebooks via HTTP (effectively solved by inline 
codebook ack/retransmit).  And perhaps it's harder to write a player 
that accepts irregular framerates, framesizes, and so forth.  But this 
doesn't seem as bad as has been implied.  What am I overlooking?


> - All encodings must have a sample rate that is an integer multiple of the
>   RTP timestamp sample rate. For example an RTP session that has 44100 RTP 
>   timestamp sample rate can only have 44100, 22050, 11025. This is mainly to
>   avoid round off problems. I think it also may make things easier for a 
>   resampler that has to handle all these streams.
> 
> - Video should use a sample rate that can accommodate various frame rates.
>   Most other video payloads use 90000Hz which allows NTSC and PAL frame rates
>   to be represented. Frame rates for all the encodings in the session must
>   not produce timestamps that need to be rounded. This is basically the same
>   requirement as above, just for video.

All this talk on sample rates is scaring me.  I didn't realize it was a 
requirement to be *absolutely* regular with framerate.  I thought we 
were talking about *average* framerates.  I assumed anything stating 
framerate is purely advisory, and the decoder should be prepared to 
handle frames that come in on any sample frequency (ie, not puke if it 
gets a frame before or after what it's expecting).

After all, at the end of the day, these samples are coming from a live 
source which *itself* isn't always producing a perfectly regular stream. 
  How could I possibly enforce absolute regularity if my camera actually 
generates only ~30FPS instead of a mathematically perfect 30FPS?


> I think keeping this sort of functionality is reasonable. We would just have to
> add an encoding ID to the currently proposed SDP format. We'd have to add back
> the chainingID field, but I think we should call it something like
> "encoding ID" and make it 8-bits or less. I don't think anyone will need more
> than 255 encodings.

I agree, 255 is probably enough (and 256 is even better).  You might 
make the statement that an inline codebook transmission for an "encoding 
ID" that is already in use overrides the old codebook.  Thus the encoder 
can manage the "encoder-space" effectively.

-david



More information about the xiph-rtp mailing list