[xiph-rtp] Codebook delivery and metadata

Wed Oct 27 15:04:36 PDT 2004

Hi Tor,

On 27 Oct 2004, at 16:18, Tor-Einar Jarnbjo wrote:

> Tirsdag, 26 oktober 2004, skrev Phil Kerr <phil at plus24.com>:
>
> Hi Phil,
>
> First of all, I would wish that you could be a little bit more precise
> when describing your suggestion. At more than one point, I'm not
> sure at all what you're actually saying.

Sorry, these notes were meant to be brief, perhaps they were a little 
too brief :)

>
>> 1.)
>> SDP is used to set the initial stream codebook.
>
> You are probably just talking about a reference to the initial codebook
> here, aren't you? I read a suggestion somewhere to actually include
> the codebook in the session descriptor, but that is in almost no
> case feasible, as the session descriptor must be less than 1kB.

Yes, having an URI reference to the codebook.

>
>> Cons:
>>
>> When do we know a codebook change will occur?  If this is a series of
>> chained Ogg files is this when the current filestream has ended?
> If so
>> then we are relying on the length of the playout buffer to inform the
>> player of the change and retrieve the new set before we have a break
> in
>> the stream.  If this cannot be done in time then the player may play
>> garbled audio.  Also adds small packet decoding overhead as each one
>> needs to be checked to see if it is data or message.
>>
>> Can the chaining module read-ahead (or use the track length) to
>> schedule sending the codebook change message at the right time?
> Can we
>> accurately pin the codebook change time?
>
> This depends greatly on the data source. The RTP transmitter running
> on j-ogg.de is e.g. only acting as a RTP proxy to a the HTTP stream
> from Virgin Radio. It is of course doable to cache 30 or 60 seconds
> of the HTTP stream to be able to send codebook-change messages in
> advance, but I can think of several reasons, why a radio broadcaster
> is not interested in such a delay for the transmission. Client-side
> buffering is also an issue, at least for mobile units without too
> much memory.
>
> Allowing in-stream changes of the codebook has also serious impact
> on how the client is able to calculate and predict bandwidth usage.
> It must be considered, that the RTP packets for the actual content
> stream are pushed by the server and not pulled by the client. Without
> additional logic and protocol specifications, the client is only
> able to receive the RTP packets as a steady stream and is not able
> to interrupt, precache parts of this stream or in any other way able
> to modify the delivery rate to allocate bandwidth slots for codebook
> downloads.
>
>> 2.)
>>
>> SDP is used to set the initial stream codebook.
>> Periodic transmission of associated codebook URL in-stream.
>
> Why? The client won't be able to play the stream at all if the SDP
> is lost (most control protocols define how the SDP is to be 
> retransmitted
>
> if lost).
>

This wasn't really intended to act as a replacement for the stream 
set-up with SDP, but to act as a repeat mechanism if codebook switching 
happens and that particular packet is lost.

>> 3.)
>>
>> As above but add codebook key to each Vorbis-RTP packet.  Each packet
>> has the MD5 key of the associated codebook needed for decoding.
>
> This just solves the problem in solution 1, which may occur if the
> codebook change message is lost. You still have the problem, that
> the client must be aware of the codebook change some time in advance
> to allow an uninterrupted switch.

Yup, this is the issue you highlighted above - how do you indicate that 
a codebook change will occur whilst giving enough time for the new 
codebook to be fetched in advance,  It's probably key to solving the 
problem.
>
>> 4.)
>>
>> Puts strain on the server re-encoding on-the-fly, or makes playlist
>> pre-processing a pain.
>
> Another possibility:
>
> 4.1)
>
> If more codebook sets are used within one session, all codebooks
> have to be defined in the session description (codebook URI, hash
> and for which time span they apply).
>
> Pros:
>
> Allows the client to preload and cache the codebook sets before 
> starting
>
> playback or at least know from the beginning when the codebook sets
> are needed, making it much easier for the client to plan bandwidth
> usage and prevent bandwidth peeks shortly before switiching to a
> new codebook.
>
> Cons:
>
> Only feasible for prerecorded content. The allowed SDP size is a
> limiting factor on how many different codebook sets may be used within
> one session.

The pros list for this option is a good list, the con could be done by 
not having all the information sent by SDP, but set an URI where the 
list (or a package) where they can be downloaded from.

The only problem is if you have a lot of clients connecting to a stream 
then you may have a reasonable spike in HTTP traffic to handle which 
somewhat negates using multicast for bandwidth conservation.  Just out 
of interest how much data is sent over H.245 when a H.323 is initiated? 
  Is what we are trying to do close to this?

>
>> Metadata is sent as a distinct message within the Vorbis-RTP stream.
>>
>> Pros:
>> Simple.
>>
>> Cons:
>> Not flexible, adds packet decoding overhead.
>
> Well, the "overhead" of checking the first byte of the packet to
> determine the packet type is not really relevant, is it?

Well, it could be.  You mentioned mobile units above, the idea I have a 
small handheld device with WiFi capability that could receive the RTP 
stream running an ARM-based processor.  Keeping the amount of 
processing done in core RTP loop is going to keep the processor 
overhead down which will mean smoother playback.

>
>> Have metadata sent in a separate RTP stream - Annodex or something
>> similar.
>>
>> Pros:
>> More extensible.
>>
>> Cons:
>> Adds implementation overhead as you need to run another RTP receiver.
>
> What do you mean with "implementation overhead"? I agree that the
> implementation may be more complex, but I can't see why this must
> have any runtime overhead compared to solution 5. Parsing XML based
> meta data (like CMML) may of course have relevant impact compared
> to parsing a binary format like in the Vorbis comment header.

As above.  XML parsing takes more processing power than simple parsing 
of binary or text fields.  The benefits in using XML though is it 
allows for a much nicer data format.

-P

>
>
> Tor
>
>
>
>
>