[xiph-rtp] about theora-over-rtp draft

Thu Jul 20 15:46:27 PDT 2006

Simon Morlat wrote:

> http://svn.xiph.org/trunk/theora/doc/draft-barbato-avt-rtp-theora-01.txt
> (the most recent I've found).

use the xml one the txt MAY be slightly outdated from time to time.

> 
> I'm the author and maintainer of linphone, a free software SIP video phone 
> (http://www.linphone.org) . I've been the first to implement speex over RTP  
> and I've contributed a little to the speex-over-rtp draft with Jean Marc 
> Valin and Greg Herlein (especially concerning SDP usage specification)

Ok

> 
> While implementing theora support in linphone, I encoutered several major 
> problems:
> 
> 1/ about packed configuration header. This packed configuration header is 
> supposed to be theora header followed immediately by theora tables. 
> Unfortunately the current theora decoder is unable to decode such packed 
> configuration (it stops after the header and ignores the table) and as far as 
> I understand there's no way to retrieve where theora tables start when 
> receiving such a packet.

there is: the first packet is fixed in size. Check the example code in:

http://svn.xiph.org/trunk/xiph-rtp/

> 
> -> as a consequence I've implemented differently: theora header and tables are 
> sent in different packets.

Wrong, nonstandard*, bad!

> 
> 2/ about fragment type. The draft defines 3 types: begin of packet, 
> continuation of packet, and end of packet. I think this is really very 
> redundant information: the receiver only needs to know the frontier between 
> video frames, nothing more. Setting the marker bit of the rtp header to 1 for 
> the last packet of a video frame is enough and much simple.

I think isn't.

> RTP (RFC3550) 
> tells it's up to payload specifications to indicate the meaning of this 
> markbit. There's no problem in using it. RFC2429-bis (payload spec for 
> H263-1998) does that.

We could have a look at it again but if it was discarded even before I
joined the development of this rfc probably there could be a reason.

> Furthermore, for the fragmentation algorithm, it is painful to know whether a 
> fragment is a end of packet or continuation packet.

Why? you just have to check the Fragment type field.

> And what about if a 
> packet isn't fragmented at all, ie it is as well a start and a end of a video 
> frame ?

Hmm, this part is that unclear?

| This field is set according to the following list
| </t>
| <vspace blankLines="1" />
| <list style="empty">
| <t>      0 = Not Fragmented</t>

One or more theora frames or a full configuration in a single rtp datagram.

| <t>      1 = Start Fragment</t>

If I get this one I have to store it somewhere and expect type 2 fragments

| <t>      2 = Continuation Fragment</t>

keep on storing type 2

| <t>      3 = End Fragment</t>

end packet, build a full frame/packed configuration out of it.

| </list>
|
| <t>This field must be zero if the number of packets field is
| non-zero.</t>

> Note that the sequence number of the rtp header let the application detect 
> incomplete frames.

I know.

>  
> -> I used the marker bit to indicate end-of-frames packet.

If you use the marker bit you have a problem in telling if is a full
packet or not, using the drafted way is immediate.

> 
> 
> 3/ I used inband sending of configuration headers. The inline SDP method has a 
> big problem for me: it forces the SDP offerer to configure its theora encoder 
> before even knowing about the bandwidth constraints of the remote side 
> (expressed using the b=<AS>: field of SDP).

No, you should use offer-answer[rfc3264] to have an agreement between
client and server see 6.2

> Thus by taking account all those preferences, each theora encoder can be 
> configured efficiently to fit the bandwidth requirements and the display 
> constraints of the remote side. The theora packed configuration packets can 
> then be sent inband (the method that I prefer), or through an alternate 
> method: (http, RTCP packet?) , but ONLY AFTER the SDP messages have been 
> exchanged.

I think Offer-Answer is a solution, maybe I could relax a bit the
constraint, but even as is it fits already your requirement:

- the encoder side knows already the maximum resolution, its side
maximum bitrate and could set a lower bound between resolution and
bandwidth using heuristics.

- the decoder side will receive an offer with a sufficiently wide set of
possibilities and it will pick the ones it supports
bandwidth/hw/protocol wise.

> For me it is very important to efficiently use bandwidth indications because 
> for example with usual DSL connections the bandwidth is sometimes limited to 
> 128kbit/s (and very often in upload case). Doing CIF at 30 fps with high 
> quality coding is not possible in this situation. I found theora codec is 
> really efficient (CIF at 7 fps works with such DSL modems). But the 
> prequesite for this to work is that the phone be able to configure its theora 
> encoder after receiving the SDP message from the remote side.

Relaxing the constraint to let the receiver answer deliver an altered
reply with all the parameters bounded between the highest and the lowest
values it got in the first offer would probably lead to more corner
cases than you may want to handle.

> 
> Finally the format I've used in my implementation (see 
> mediastreamer2/src/theora.c in linphone) can be sum-up like this:

[a completely nonstandard implementation]

> Finally, I would expect about this draft to tell how to split a big theora 
> frame in several mtu-sized packets in a way that would make a partially 
> received frame usable by the decoder. In other words, how to be as safe as 
> possible in case of packet losses. But I don't know whether this is something 
> possible, I don't know enough about the internals of theora.

The current draft expects a lost one - lost everything scenario.

> 
> That's all for my comments. I just want to try to keep the world as simple as 
> possible and bring my developer experience as well as my user-experience of 
> video-telephony.
> Despite I've made reference to RFC2429-bis (H263-1998) I don't consider this 
> paper as an example to follow, I'm sure we can do better.

Please reread the draft, I'm afraid you miss some sections and that
means that it isn't crystal clear yet.

> I don't want linphone to be an out-of-standarts video phone, so I would really 
> like it to implement the draft you are working on. However I would really 
> like that this future RFC to be as clear and simple as possible. I'm really 
> bored with that obscure RFCs that sometimes go out from the IETF (ex: 
> rfc2190, amr over rtp, mpeg4 over rtp...).

> I think with a good RFC, theora would be really superior to MPEG4 in the real 
> time streaming world.

Theora is quite a good fit for your specific application IMHO, I think
we will find a good way to make it work on linphone.

> 
> Thanks a lot for reading this, I'm waiting for your feedbacks.
> Also, I'd like to thank Mr. Barbato for all the work he has already done with 
> this draft.

I'm happy to see that someone started implementing rtp-theora.

Thank you for your feedback

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero