[xiph-rtp] Chaining

Tue Aug 30 08:27:53 PDT 2005

David,

You present some good arguments. I do believe that in your particular use
case it make sense to allow multiple encoding settings for a session. I'm
calling it different encoding settings instead of "chaining" because I think
the intent of the two uses are a little different. In your case you just want
to provide a set of alternate ways to encode something. Chaining is a generic
mechanism for stringing clips together. In some cases I'll admit that the
distinction is subtle.

You haven't mentioned what codecs your using right now. If you are using
Theora and/or Speex you don't need to change the codebooks to change the rate.
All you need to do is reinit the decoder with different parameters. In the
case of Theora, you can just reinit the current encoder with a different
bitrate or quality and it will output bits at a different rate. If you rely
on the RTP timestamps instead of what the decoder tells you, then you can also
change the frame rate without needing to update the codebook. Vorbis may not
have this sort of flexibility. I'm not an expert, but I believe that this is
what MikeS said on IRC.

Since you may not be able to rate adapt Vorbis streams we may want to allow
multiple encodings to be specified at session establishment time. This makes
things a little more flexible so that we can accomadate your use case, but
doesn't open the flood gates of all sorts of wacky complexity that full fledged
chaining support would require. If we decide to allow this I'd suggest we
put the following requirements on the encodings that are specified in a
session.

- All encodings must have a sample rate that is an integer multiple of the
  RTP timestamp sample rate. For example an RTP session that has 44100 RTP 
  timestamp sample rate can only have 44100, 22050, 11025. This is mainly to
  avoid round off problems. I think it also may make things easier for a 
  resampler that has to handle all these streams.

- Video should use a sample rate that can accommodate various frame rates.
  Most other video payloads use 90000Hz which allows NTSC and PAL frame rates
  to be represented. Frame rates for all the encodings in the session must
  not produce timestamps that need to be rounded. This is basically the same
  requirement as above, just for video.

- I'd like to say no switching codecs, but I'm not as rigid about this one. As
  long as the client has enough info when it gets the SDP to know what codecs
  are going to be used, it can detect whether it will be able to playback the
  stream or not.

I think keeping this sort of functionality is reasonable. We would just have to
add an encoding ID to the currently proposed SDP format. We'd have to add back
the chainingID field, but I think we should call it something like
"encoding ID" and make it 8-bits or less. I don't think anyone will need more
than 255 encodings.

Aaron