[xiph-rtp] about theora-over-rtp draft

Fri Jul 21 13:05:23 PDT 2006

Simon Morlat wrote:
> Thanks for your quick feedbacks !

Just a request: configure your client to break the lines at the 79th
col, thank you

> 
>> there is: the first packet is fixed in size. Check the example code in:
>>
>> http://svn.xiph.org/trunk/xiph-rtp/
> If this is a known fact that the header is fixed size, then it should be
> written in the rfc.

The rfc assumes that people will read the normative refs, I'd just move
the theora ref to the normative section.

> However I don't think it's a good idea, this may prevent theora developers
> to make evolutions to the codec.

The codec theora is frozen, changes will lead to theora-newversion.

>> I think isn't.
> Why ?

you may have:

- fully enclosed frames per single frame.
- non raw video payloads.

> 
> You can find the rfc2429 draft here:
> http://www.ietf.org/internet-drafts/draft-ietf-avt-rfc2429-bis-09.txt

ok

> I was talking about the algorithm at the encoding side that splits
> a theora frame into smaller packets, not the algorithm at the receiving side.

see my example code. dead easy (too much) ^^;

> 
> 
>>> -> I used the marker bit to indicate end-of-frames packet.
>> If you use the marker bit you have a problem in telling if is a full
>> packet or not, using the drafted way is immediate.
> 
> What problem ? The algorithm takes in four lines:
> packet=rtp_get();
> accumulate_into_internal_buffer(packet);
> if (!marker_bit_set(packet))
> 	process_accumulated_buffer_and_clean_it();

you lose something at the start of a fragment chain, what would you do?

If it were just another single packet (marker bit not set) you just
ignore this loss, if you had lost the start of a fragmented frame you'd
feed the decoder with something more or less unexpected.

         1         2 time
12345678901234567890
NMNNNMMMMMMNNMMMNNNN bit Marked or Notmarked
 ^lost      ^lost
_||__|||||||_||||___ _fullframe |frags
FSEFFSCCCCCEFSCCEFFF Fullframe, Startfrag, Contfrag, Endfrag

in my case you know what's lost.

> 
>>> 3/ I used inband sending of configuration headers. The inline SDP method
>>> has a big problem for me: it forces the SDP offerer to configure its
>>> theora encoder before even knowing about the bandwidth constraints of the
>>> remote side (expressed using the b=<AS>: field of SDP).
>> No, you should use offer-answer[rfc3264] to have an agreement between
>> client and server see 6.2
> 
> Please explain how the SDP offerer could propose a theora configuration inlined
> in the sdp message before even knowing the receiving prequisite of the other side.
> This cannot work. Typically if the offerer want to send CIF at high bit rate,
> but unfortunately the other side cannot receive else than QCIF at very low bitrate,
> you have NO way to give him the configuration string that fits thoses requirements.
> The SDP offer/answer is made of only 2 messages !

   At any time, either agent MAY generate a new offer that updates the
   session.  However, it MUST NOT generate a new offer if it has
   received an offer which it has not yet answered or rejected.
   Furthermore, it MUST NOT generate a new offer if it has generated a
   prior offer for which it has not yet received an answer or a
   rejection.  If an agent receives an offer after having sent one, but
   before receiving an answer to it, this is considered a "glare"
   condition. [rfc3264 - 4 Protocol Operation]

> 

See further:
http://www.ietf.org/internet-drafts/draft-xu-mmusic-sdp-codec-param-01.txt

> Taken from RFC3264:
> "If the bandwidth attribute is present for a stream, it indicates the
>    desired bandwidth that the offerer would like to receive. "
> 
> If the SDP offerer sends its theora configuration in the sdp offer message,
> it logically CANNOT take in account the bandwidth attribute sent by the remote
> side.

But if you send _multiple_ offers the client can pick which one they
like most, keep in mind that _nothing_ is preventing you to:

- use adaptative techs like the ones supported by nemesi/fenice[1]
- use something like codec-param (I'm thinking about adding it soon)
- keep the offer-answer ballet till you get to agree.

> 
> But how the decoder side will inform the encoder side of the config it
> has choosen ?

The only remaining methods/configurations in the answer.

> Furthermore most streams are full duplex, they are two encoder side and two decoder
> side. We should talk about SDP offerer and answerer.

yup

> Let's imagine the SDP offerer suggests 3 theora config using various bandwidth
> constraint.
> The SDP answerer can eventually tell which one of the config it chooses.
> But as it also sends a theora stream and will also suggests various theora
> configs int the same way
> Unfortunately as the SDP offerer has no way to indicate the config it chooses
> as there is no third message.

why not?

> 
> As far as I understand, the only SDP offer/answer model that works is the one
> where each side tells its preferences and constraints about the stream they wish
> to receive.
> Remember also that many internet access are asymetric: we really need that both
> sides indicates their receiving capabilities.

works too.

> 
>> [a completely nonstandard implementation]
> Unfortunately... I want my software to work to let me see my family and friends.
> As we only have 512/128 adsl connection, it's important that each side properly
> takes in account the receiving capabilities of the other side.

see already mentioned rfcs

> 
> Lastly, I forgot to tell about the possibility offered by the draft to put several
> theora frames in a single rtp packet. My opinion is that this is completely useless
> because RTP is done for real-time streaming, and putting several theora frames in a
> same rtp packet means buffering theora frames before sending them, thus it's no
> more real-time.
> Despite this possibility is often offered by audio codecs to save bandwidth by 
> reducing rtp overhead, it has never been used for video codecs, simply because the
> huge size of video packets compared to the rtp header makes the gain of bandwidth 
> very limited. Thus it's simply more simple to rely on the underlining protocol 
> (UDP/RTP) to know the size the video packets, and assume that each RTP packet 
> contains at most 1 video frame.

You forgot that:
 - using rtp-theora for yuotube-like application is not so far from
possibility (you want low bitrate with nice but not perfect quality)
 - how many museconds/milliseconds/seconds you buffer in those not so
real-time applications?

the draft isn't preventing you to disregard collated packets at all.

Keep in mind that the draft must not be perfect for a single task, but
cover most of the common usages and let you take just the part you need.

Thank you for the comments again =)

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero