[xiph-rtp] header caching and chaining

Thu Feb 3 01:26:31 PST 2005

This is based on a conversation derf and I had last week on irc.

There's been some debate about chaining support in RTP transmission. If
I may summarize the substantive arguments:

Counter:
  Chaining is (much more of) a pain to support than in Ogg
  Many streams are live encodings or only use one set of codebooks anyway

Pro:
  Being able to switch bitrates on the fly without the clients having
    to reconnect is a nice feature that requires chaining. Not being
    able to do this would be a regression for both Real and Icecast.
  For theora, being able to at least switch framerate when cutting
    between material is a bit win. Resampling and re-encoding like
    can be done with audio is prohibitively expensive for the medium
    term.

Personally, I come down on the side of no chaining for audio, and yes
chaining for video. Lovely.

Phil's proposal for doing chaining is to have a 32-bit crc of the
bitstream header in each RTP packet. This lets the decoder know
when the a 'chain boundary' has passed, and a new context applies.
Using a CRC lets the client cache headers, optimizing latency and
bandwidth usage. Headers can be sent in-band (not recommended) or
packaged up for out-of-band retrieval, broadcast or otherwise
distributed.

So far, so good. But what exactly should be hashed. Derf and I have
verified that with theora the initial 'info' header with the frame
size, rate, and so on, as passed in the SDP is completely orthogonal
to the contents of the third 'setup' header. There may be an issue
in the future when we add interlaced support, but orthogonality
can in general be restored by using that flag from the info header
as an additional cache key. I believe the two headers are also
orthogonal in vorbis, though I haven't double checked.

So, the cleanest thing would be to hash only the setup header and
rely on the SDP for the info header details.

BUT, this means that things like sample rate, image size, and so
on can't change mid-stream, only the codebooks themselves. We've
only supported one of the two motivations for chaining. This may
still be a reasonable compromise: it preserves quality/bandwidth
scalability while simplifying a lot of the things people have 
complained about with chaining because their code assumed image
size et al. couldn't change mid-stream.

If we do want to support info header changes, we have to transmit
and cache both of them. We could, for example, just concatenate
the two and hash them together. This will make for a lot more
cache entries differing only by a few bytes, but does simplify/
obsolete defining those same fields in the SDP.

What Phil has proposed in the drafts is more complicated yet:
The crc hash is just of the 3rd setup header data, but the
info headers are also tranmitted in the same package...indexed
by the same crc as the stream data itself. I don't particularly
like this as described, because all three headers for each
segment are packed together, there's no scope for redundancy,
and the bandwidth spike is going to be much worse than in-line
retrieval.

Opinions?

 -r