[xiph-rtp] about theora-over-rtp draft

Mon Jul 24 09:05:17 PDT 2006

On Mon, Jul 24, 2006 at 12:31:02PM +0200, Simon Morlat wrote:

> But you can do the same thing using the sequence number. Marker bit says: that 
> packet carries the end of frame. As a consequence a fullframe has the marker 
> bit (which is not the case in your example). Sequence number discontinuities 
> can be handled by the application by dropping packets until a marker bit is 
> received, then restart the accumulate algo I've described.
> The only advantage I see of your technique (I've just realized it) is that you 
> can differentiate a FullFrame from an "end of frame with begin lost". 
> But anyway you could also decide to always pass a marked packet to the decoder 
> (regardless it can be fullframe or just an end of frame with possibly lost 
> start) and see what happens. If it is not a fullframe the theora decoder 
> won't be able to decode it.

I think you've pointed out another difference between the schemes here: 
latency. With the 2 fragmentation bits you know immediately if you've 
lost a frame, while the marker bit scheme you don't know until you see 
the next packet with a marker bit. In a naive implementation you lag
either way, but if you're going to do something to interpolate the 
missing frame, you have more time to do that if you don't have to wait 
until the next marker bit falls out of the jitter buffer.

Or...can you infer the same thing from the timestamp?

> [...]
> In this scenario I agree that the video buffering is not the main cause of 
> latency, but it's clear that an implementor that needs to reduce the latency 
> of its telephony application will NEVER have fun to bufferize video frames to 
> put them in a single RTP packet.

Thanks for the excellent latency analysis. It's always nice to see 
numbers. :)

> What are for you the benefits of being able to put several theora frames in a 
> single RTP packet ?

Our other main design goal was to support multicast. Generally speaking, 
non-interactive applications are the opposite case, where bandwidth use 
is more important than latency. Luca pointed out a "youtube"-like 
unicast streaming application, but IP multicast is the case where RTP 
transport is absolutely essential.

It is for these situations that we provide the packing mechanism. So 
as you say, if you're implementing a receiver you need to handle this, 
but as you're free to not use it in your sender if it's not appropriate
for your application.

> To sum up all this discussion, I'd like this draft to:
> - clarify the packed-conf message (limit between header and tables)

Agreed.

> - explain a SDP operation that let each side to configure asymetrically in a 
> simple offer-answer (only 2 messages) scheme (for me it implies to NOT 
> transmit inline encoder configuration in SDP, which prevents the offerer to 
> adapt to the other end).

Sounds reasonable.

> - use less bits to indicate fragmentation (for me 1 bit is enough, 2 if you 
> wish to indicate begin of frame and end of frame). Whether this bit is RTP 
> marker or not is not important.
> - assume each rtp contains at most one frame.
> 	The last two points in the goal of having much simpler unpacketisation code.

I don't think you've made a convincing case here.

> Currently the unpacketisation code necessary for implementing this draft makes 
> it more complex than the RFC2429bis or RFC3016 packetisation, which I think 
> is not good if we want more and more people like me or companies to prefer 
> open-source technology instead of heavy patented ones.

Doesn't the (switchable) codebook transmission requirement completely 
overshadow this? Is your rtp library somehow written in such a way that 
it needs significant changes to do packing this way? Is Luca's source 
code not helpful here?

 -r