[xiph-rtp] Chaining

Sat Aug 27 18:49:50 PDT 2005

Ralph Giles wrote:

> 
> 
> The server can also just transcode. We expect RTP transmission from a
> chained Ogg stream to be something of an edge case. A lot of stations
> will be encoding either directly from a live production feed (only
> one stream in any case) or from a batch encode, for which ensuring
> codebook uniformity isn't such a big issue. That just leaves casual
> users with a heterogeneous Ogg collection on disk.

In case of live/pseudolive feeds it is just one stream so the problem
doesn't apply regardless it uses rtsp or whatever for control.

> 
> Can you explain a bit more about how the server would send the new
> session parameters? Is it possible to have that work and keep gapless
> playback?

IF you are using rtsp I'm pretty confident you may have gapless playback
of chained or not chained vorbis in rtp (you may push from the server to
the client the new metadata associated to the next stream and/or signal
that the client should prepare to switch to the other stream). From the
standard rtp point (eg, I just have inband metadata that marks the next
file to the previous) it would require a large buffer and some sort of
lookahead logic to do some sort of crossfade or you'll always have the
time to reinit the decoder with the new informations.

That said, from the rtp draft I'd keep one vorbis stream per rtp stream
and move the discussion about the usage of rtsp, sip or other
session/control protocols your application want to use.

> 
> Not supporting chaining was in fact the original suggestion, made 
> initially by Jack about a year ago. If we'd done that we could have
> been all finished six months ago. :)

The problem is where supporting chaining. I'd prefer to support it using
the control/session protocol since would make simpler implement some
features.

> 
> For me the two persuasive arguments were:
> 
> 1. Adaptive bitrate switching. In a unicast RTP setting, the server
> can use packet loss statistics to dynamically adjust the bitrate
> sent to individual clients. In the case of configurable codecs like
> Vorbis and Theora, this means being able to change the codebooks,
> and even things like samplerate/framerate and image size (though
> the player should rescale to avoid popping in the later case.)

That would require RTCP for QoS and RTSP to push the right configuration
and do the switch, or other equivalent protocols.

> 
> Aaron essentially told us this was a requirement for Real, since
> it's already a feature with their native codecs (though they
> have fixed codebooks, so chaining isn't painful). That's why
> we went with chaining support.
> 

If the "chained" stream is just a transcoded pseudo live stream won't be
a problem since would be just one codebook

> 2. Video resampling is much more expensive and artefact-prone
> than audio resampling, so at least in the medium term, it is 
> attractive to be able to use chaining in Theora to support 
> interleave of e.g. film and video without having to do format
> conversion. This isn't compelling on its own, but makes reason
> 1 less lonely. :)

The same problems and issues would apply and the same solutions should work.

> 
> Anyway, that was the reasoning behind the decision. I don't
> see any particular reason to revisit it unless you have either
> a much simpler method to achieve equivalent results, or a
> good argument why Real's requirements aren't worth addressing.
> 

Everything is up to the application usage and the protocols that the
application have, that's why I requested a list of planned applications
and scenarios.

So far I could think about:

1 Netradio using rtp/rtsp:
- Chaining required to make the playlist look like a flat stream, you
can archive the same result with dynamic stream switching and having the
client supporting crossfade.

- It would be probably multicast so the RTCP information could be
meaningfull only if you have a load balancing setting by twins or
providing content using "repeaters" as particular case.

2 Conference/Voip using rtp/rtsp or rtp and sip:
- You won't have the problem of chaining but you'll have the problem of
syncing many different streams

- You may want to dynamic switch bitrate using the QoS. If the stream
could be optimized for bitpeeling that would be quite interesting and
quite inespensive to implement.

In both cases adding Video or subtitle or other rtp streams to the audio
scenario just adds the problem of syncronization

I hope that clarifies my point of view.

lu

-- 

Luca Barbato

Gentoo/linux Developer		Gentoo/PPC Operational Leader
http://dev.gentoo.org/~lu_zero