[xiph-rtp] header caching and chaining

Aaron Colwell acolwell at real.com
Thu Feb 3 09:08:11 PST 2005


On Thu, Feb 03, 2005 at 01:26:31AM -0800, Ralph Giles wrote:
> This is based on a conversation derf and I had last week on irc.
> 
> There's been some debate about chaining support in RTP transmission. If
> I may summarize the substantive arguments:
> 
> Counter:
>   Chaining is (much more of) a pain to support than in Ogg
>   Many streams are live encodings or only use one set of codebooks anyway
> 
> Pro:
>   Being able to switch bitrates on the fly without the clients having
>     to reconnect is a nice feature that requires chaining. Not being
>     able to do this would be a regression for both Real and Icecast.
>   For theora, being able to at least switch framerate when cutting
>     between material is a bit win. Resampling and re-encoding like
>     can be done with audio is prohibitively expensive for the medium
>     term.
> 

I think one other Pro is that you would be able to stream a larger set of
valid .ogg files over RTP. Unfortunately this still won't let you stream all
valid .ogg files, but it is getting closer.

> Personally, I come down on the side of no chaining for audio, and yes
> chaining for video. Lovely.

Why is it more beneficial for video than audio? I'm assuming you want chaining
for video to allow frame rate changes and codebook changes to allow better
bitrate characteristics. Couldn't these same arguments be held for audio as 
well? If you have audio that only has low frequency components it may make 
sense to use a lower sample rate. 

> 
> Phil's proposal for doing chaining is to have a 32-bit crc of the
> bitstream header in each RTP packet. This lets the decoder know
> when the a 'chain boundary' has passed, and a new context applies.
> Using a CRC lets the client cache headers, optimizing latency and
> bandwidth usage. Headers can be sent in-band (not recommended) or
> packaged up for out-of-band retrieval, broadcast or otherwise
> distributed.
> 
> So far, so good. But what exactly should be hashed. Derf and I have
> verified that with theora the initial 'info' header with the frame
> size, rate, and so on, as passed in the SDP is completely orthogonal
> to the contents of the third 'setup' header. There may be an issue
> in the future when we add interlaced support, but orthogonality
> can in general be restored by using that flag from the info header
> as an additional cache key. I believe the two headers are also
> orthogonal in vorbis, though I haven't double checked.

I'm not sure if the 2 headers are orthogonal in Vorbis. You need to know the
channels value from the info header to properly decode the setup header. This
will constrain you from changing the number of channels across chain boundries.
The block sizes are also stored in the info header so that wouldn't be able to
change across chain boundries either. These 2 constraints seem to me to 
severely limit the usefulness of chaining support if you can only update the
setup header.

> 
> So, the cleanest thing would be to hash only the setup header and
> rely on the SDP for the info header details.
> 
> BUT, this means that things like sample rate, image size, and so
> on can't change mid-stream, only the codebooks themselves. We've
> only supported one of the two motivations for chaining. This may
> still be a reasonable compromise: it preserves quality/bandwidth
> scalability while simplifying a lot of the things people have 
> complained about with chaining because their code assumed image
> size et al. couldn't change mid-stream.

To properly handle local and Icecast playback they have to solve these problems
anyways so I don't really see much savings here.

> 
> If we do want to support info header changes, we have to transmit
> and cache both of them. We could, for example, just concatenate
> the two and hash them together. This will make for a lot more
> cache entries differing only by a few bytes, but does simplify/
> obsolete defining those same fields in the SDP.

This seems reasonable to me. This allows the CRC to represent the (info,setup)
tuple uniquely which is what we need.

> 
> What Phil has proposed in the drafts is more complicated yet:
> The crc hash is just of the 3rd setup header data, but the
> info headers are also tranmitted in the same package...indexed
> by the same crc as the stream data itself. I don't particularly
> like this as described, because all three headers for each
> segment are packed together, there's no scope for redundancy,
> and the bandwidth spike is going to be much worse than in-line
> retrieval.

I haven't read his proposals yet (I'll try to get to this today), but the
way you describe it doesn't sound like it would work. What happens in the case
where you use the same setup header, but a different frame size or frame rate?
The CRC wouldn't change so the client wouldn't know that these parameters
changed. Depending on how the frame size changed, frame decode could fail
because there are either too many or too few coded coeff, block coded flags, 
etc.

Aaron

> 
> Opinions?
> 
>  -r
> _______________________________________________
> xiph-rtp mailing list
> xiph-rtp at xiph.org
> http://lists.xiph.org/mailman/listinfo/xiph-rtp
> 


More information about the xiph-rtp mailing list