[xiph-rtp] Lots of proposals

Sun Sep 4 16:35:13 PDT 2005

David Barrett wrote:

> Ok, I think we've covered all the bases we're going to cover.  I'm 
> going to attempt a summary of my position -- Tor, will you please do 
> the same? (ie, please don't respond point-by-point to my position, 
> just summarize yours)

Ok, let me first explain why I think inline codebook delivery with or 
without client acknowledge is one of the worst methods we have been 
discussing until yet:

- You can't make a decent implementation of it for multicast. The only 
possibility for inline codebook delivery to work with multicast would be 
to continuously transmit the codebook data and hence either waste an 
unaccepatable amount of bandwidth or introduce an unacceptable delay at 
the beginning of a stream while the client waits for a complete codebook 
set to be received. Even if multicast transmissions are not commonly 
used today, more and more ISPs are at least starting to experiment with 
multicast and it is the only feasible solution to avoid bandwidth 
cludges as the internet will be used more as a transport medium for 
audio and video streaming. For unicast scenarios, Ogg/Vorbis over HTTP 
is already used quite a lot. As long as the Vorbis codec itself has a 
realatively high latency and is not designed for low latency "real time" 
streaming situations, unicast Vorbis over RTP won't bring much advantage 
over Ogg/Vorbis/HTTP. I would expect an RFC for Vorbis over RTP, which 
only allows unicast will be very much neglected an not very usable.

- Inline transmission with client acknowledge will not work in 
unidirectional network environments. Although this is not very likely 
for unicast situations, it will be for multicast, as there may be 
situations where the client is simply joining an ongoing session without 
server knowledge.

- Even in unicast situations, the delay when starting a stream may be 
inacceptable. The codebook header is by the standard not limited in 
size, but even if you do some calculus on codebook sizes commonly being 
used by current encoders, the codebook transmission will take several 
seconds at least. The server would have to stream the codebook at the 
same rate as the audio stream, potentially letting the client wait 
unnessecary long for the transmission to complete. To stay below the 
network MTU, we can assume that a "common size" codebook would be split 
into something around 5 RTP packets. In a network with 2% packet loss, 
there will be a chance of 9,6% that any of these packets will not arrive 
at the client. Hence, it should at least be mandatory for the server to 
send the codebook twice _before starting to stream audio at all to 
minimize the chance that the entire stream is undecodeable and this 
raises the delay before playback can begin accordingly.

- I would expect most usecases for Vorbis over RTP to be web radios and 
music "on demand" services. Designing the RFC to only fit well a 
situation where multidirectional streams are required (e.g. the "client" 
must also be able to transmit its codebook to the "server") is a major 
mistake, as it will probably rarely ever be needed.

I am by no means extremely advocating any other solution and there have 
been a few other reasonable delivery methods discussed:

- URI reference to the codebook in the SDP. In this case I would suggest 
HTTP and whatever protocol being used to setup the RTP stream to be 
mandatory. E.g. HTTP and RTSP for an RTSP server or HTTP and SIP for a 
SIP client or registrar. At the server side, I would assume that it in 
most cases would be feasible to make use of an existing HTTP server to 
support HTTP delivery. At the client side, it would not be much effort 
to either implement enough of the HTTP protocol or make use of available 
HTTP client libraries to fetch the codebook. If HTTP for some reason is 
not feasible, the other protocol may be used.

- Agree on a fixed set of codebooks for RTP. Codebook optimizers have 
shown to only save a few percent on the file size for streams created by 
the reference encoder, so I am not really convinced that dynamic 
codebooks are very useful. This may of course be because the actual 
stream data created by the reference encoder are fitting the fixed 
codebooks well or vice versa. A drawback on this would be that the 
decoder software size increases. I've not had time to check the complete 
size of all codebooks used by the reference encoder, but as a 
comparison, the static codebooks used by WMA could be stored in around 
25kB. As pointed out in a response to my question on this subject on the 
AVT mailing list, it would be easily feasible for a transmitter to 
reencode a local "unsupported" Vorbis stream using with a supported 
codebook.

Tor