[xiph-rtp] about theora-over-rtp draft
Simon Morlat
simon.morlat at linphone.org
Thu Jul 20 14:33:51 PDT 2006
Hello,
I tried to implement the rtp payload packetisation for theora defined in draft
http://svn.xiph.org/trunk/theora/doc/draft-barbato-avt-rtp-theora-01.txt
(the most recent I've found).
I'm the author and maintainer of linphone, a free software SIP video phone
(http://www.linphone.org) . I've been the first to implement speex over RTP
and I've contributed a little to the speex-over-rtp draft with Jean Marc
Valin and Greg Herlein (especially concerning SDP usage specification)
While implementing theora support in linphone, I encoutered several major
problems:
1/ about packed configuration header. This packed configuration header is
supposed to be theora header followed immediately by theora tables.
Unfortunately the current theora decoder is unable to decode such packed
configuration (it stops after the header and ignores the table) and as far as
I understand there's no way to retrieve where theora tables start when
receiving such a packet.
-> as a consequence I've implemented differently: theora header and tables are
sent in different packets.
2/ about fragment type. The draft defines 3 types: begin of packet,
continuation of packet, and end of packet. I think this is really very
redundant information: the receiver only needs to know the frontier between
video frames, nothing more. Setting the marker bit of the rtp header to 1 for
the last packet of a video frame is enough and much simple. RTP (RFC3550)
tells it's up to payload specifications to indicate the meaning of this
markbit. There's no problem in using it. RFC2429-bis (payload spec for
H263-1998) does that.
Furthermore, for the fragmentation algorithm, it is painful to know whether a
fragment is a end of packet or continuation packet. And what about if a
packet isn't fragmented at all, ie it is as well a start and a end of a video
frame ?
Note that the sequence number of the rtp header let the application detect
incomplete frames.
-> I used the marker bit to indicate end-of-frames packet.
3/ I used inband sending of configuration headers. The inline SDP method has a
big problem for me: it forces the SDP offerer to configure its theora encoder
before even knowing about the bandwidth constraints of the remote side
(expressed using the b=<AS>: field of SDP).
The logical behaviour for me would be that each side expresses (using SDP and
a possible a=fmtp line) its receiving preferences, for example
b=<AS>:64
a=fmtp:99 QCIF=2
(meaning:
limit to 64kbit/second
this device can only display QCIF
pictures at framerate=(29,97/2=15) frames per second, as in RFC2429-bis)
Thus by taking account all those preferences, each theora encoder can be
configured efficiently to fit the bandwidth requirements and the display
constraints of the remote side. The theora packed configuration packets can
then be sent inband (the method that I prefer), or through an alternate
method: (http, RTCP packet?) , but ONLY AFTER the SDP messages have been
exchanged.
For me it is very important to efficiently use bandwidth indications because
for example with usual DSL connections the bandwidth is sometimes limited to
128kbit/s (and very often in upload case). Doing CIF at 30 fps with high
quality coding is not possible in this situation. I found theora codec is
really efficient (CIF at 7 fps works with such DSL modems). But the
prequesite for this to work is that the phone be able to configure its theora
encoder after receiving the SDP message from the remote side.
Finally the format I've used in my implementation (see
mediastreamer2/src/theora.c in linphone) can be sum-up like this:
- use the marker bit to indicate end of video frames packet
- use a payload header like this:
| 24 bits of config ident | 5 unused bits | 3 bits of packet type |
| theora data.................................|
The 3 bits of packet type can be:
#define THEORA_RAW_DATA 0
#define THEORA_HEADER_DATA 1
#define THEORA_COMMENT_DATA 2
#define THEORA_TABLES_DATA 3
I don't use the comment data.
At the start of the session, theora header are sent, then theora tables (that
are fragmented since they are quite big). Those packets are sent 3 times to
improve reliability in case of packet losses. Note: there are surely better
approaches to improve reliability.
Then theora data is sent normally (THEORA_RAW_DATA).
Finally, I would expect about this draft to tell how to split a big theora
frame in several mtu-sized packets in a way that would make a partially
received frame usable by the decoder. In other words, how to be as safe as
possible in case of packet losses. But I don't know whether this is something
possible, I don't know enough about the internals of theora.
That's all for my comments. I just want to try to keep the world as simple as
possible and bring my developer experience as well as my user-experience of
video-telephony.
Despite I've made reference to RFC2429-bis (H263-1998) I don't consider this
paper as an example to follow, I'm sure we can do better.
I don't want linphone to be an out-of-standarts video phone, so I would really
like it to implement the draft you are working on. However I would really
like that this future RFC to be as clear and simple as possible. I'm really
bored with that obscure RFCs that sometimes go out from the IETF (ex:
rfc2190, amr over rtp, mpeg4 over rtp...).
I think with a good RFC, theora would be really superior to MPEG4 in the real
time streaming world.
Thanks a lot for reading this, I'm waiting for your feedbacks.
Also, I'd like to thank Mr. Barbato for all the work he has already done with
this draft.
Simon
More information about the xiph-rtp
mailing list