[xiph-rtp] Codebook delivery and metadata

Wed Oct 27 08:18:57 PDT 2004

Tirsdag, 26 oktober 2004, skrev Phil Kerr <phil at plus24.com>:

Hi Phil,

First of all, I would wish that you could be a little bit more precise 
when describing your suggestion. At more than one point, I'm not 
sure at all what you're actually saying.

>1.)
>SDP is used to set the initial stream codebook.

You are probably just talking about a reference to the initial codebook 
here, aren't you? I read a suggestion somewhere to actually include 
the codebook in the session descriptor, but that is in almost no 
case feasible, as the session descriptor must be less than 1kB.

>Cons:
>
>When do we know a codebook change will occur?  If this is a series of  
>chained Ogg files is this when the current filestream has ended? 
If so  
>then we are relying on the length of the playout buffer to inform the  
>player of the change and retrieve the new set before we have a break 
in  
>the stream.  If this cannot be done in time then the player may play  
>garbled audio.  Also adds small packet decoding overhead as each one  
>needs to be checked to see if it is data or message.
>
>Can the chaining module read-ahead (or use the track length) to  
>schedule sending the codebook change message at the right time? 
Can we  
>accurately pin the codebook change time?

This depends greatly on the data source. The RTP transmitter running 
on j-ogg.de is e.g. only acting as a RTP proxy to a the HTTP stream 
from Virgin Radio. It is of course doable to cache 30 or 60 seconds 
of the HTTP stream to be able to send codebook-change messages in 
advance, but I can think of several reasons, why a radio broadcaster 
is not interested in such a delay for the transmission. Client-side 
buffering is also an issue, at least for mobile units without too 
much memory.

Allowing in-stream changes of the codebook has also serious impact 
on how the client is able to calculate and predict bandwidth usage.
It must be considered, that the RTP packets for the actual content 
stream are pushed by the server and not pulled by the client. Without 
additional logic and protocol specifications, the client is only 
able to receive the RTP packets as a steady stream and is not able 
to interrupt, precache parts of this stream or in any other way able 
to modify the delivery rate to allocate bandwidth slots for codebook 
downloads. 

>2.)
>
>SDP is used to set the initial stream codebook.
>Periodic transmission of associated codebook URL in-stream.

Why? The client won't be able to play the stream at all if the SDP 
is lost (most control protocols define how the SDP is to be retransmitted 

if lost).

>3.)
>
>As above but add codebook key to each Vorbis-RTP packet.  Each packet  
>has the MD5 key of the associated codebook needed for decoding.

This just solves the problem in solution 1, which may occur if the 
codebook change message is lost. You still have the problem, that 
the client must be aware of the codebook change some time in advance 
to allow an uninterrupted switch.

>4.)
>
>Puts strain on the server re-encoding on-the-fly, or makes playlist  
>pre-processing a pain.

Another possibility:

4.1)

If more codebook sets are used within one session, all codebooks 
have to be defined in the session description (codebook URI, hash 
and for which time span they apply).

Pros:

Allows the client to preload and cache the codebook sets before starting 

playback or at least know from the beginning when the codebook sets 
are needed, making it much easier for the client to plan bandwidth 
usage and prevent bandwidth peeks shortly before switiching to a 
new codebook.

Cons:

Only feasible for prerecorded content. The allowed SDP size is a 
limiting factor on how many different codebook sets may be used within 
one session.

>Metadata is sent as a distinct message within the Vorbis-RTP stream.
>
>Pros:
>Simple.
>
>Cons:
>Not flexible, adds packet decoding overhead.

Well, the "overhead" of checking the first byte of the packet to 
determine the packet type is not really relevant, is it?

>Have metadata sent in a separate RTP stream - Annodex or something  
>similar.
>
>Pros:
>More extensible.
>
>Cons:
>Adds implementation overhead as you need to run another RTP receiver.

What do you mean with "implementation overhead"? I agree that the 
implementation may be more complex, but I can't see why this must 
have any runtime overhead compared to solution 5. Parsing XML based 
meta data (like CMML) may of course have relevant impact compared 
to parsing a binary format like in the Vorbis comment header.

Tor