[Speex-dev] transfering decoder state
Jean-Marc Valin
jean-marc.valin at usherbrooke.ca
Thu Nov 16 22:13:57 PST 2006
> I am a bit of an alien here, my expertise is not at all in DSP:
...like most people on this list I suspect, which is fine.
> You DSP guys would probably curse us if you saw what kind of ugly things
> we do to compressed audio flows. One of them being: violently switching
> a flow from one machine to another. I guess it is not a nice thing to do
> and it ends up with weird audio artefacts sounding a bit like a
> submarine ping usually. (By the way, we have really good genuine reasons
> to want to do that :-)
I assume that what you're trying to do here is having multiple players
talking to each other through a server and the server is responsible for
selecting only one (or a few) streams that will be sent back to all the
players, without having to transcode and mix. Is that correct?
> To be honest I have not yet much of a clue on how speex works, so anyone
> stop me right away if I say something stupid. The encoder/decoder are
> based on linear prediction and the information sent is somehow related
> to the errors in these predictions. More importantly, the decoder is a
> state machine, which has a state and is "in tune" with its encoder.
> Forwarding the speex flow to a decoder with an uninitialised (or not "in
> tune") state creates these audio artefacts I was talking about.
I'm not sure whether Speex fits perfectly in the definition of an FSM,
but you're right in saying that the decoder has a state that needs to be
in sync with the encoder for the decoding to be flawless.
> If what I say above is kind of correct, copying the state of the first
> decoder into the new decoder would solve the problem. So maybe I could
> push a decoder state to another machine accross the network.
This is theoretically possible, but I wouldn't recommend attempting it
because:
1) The state would be about 2 kB in narrowband, equivalent to a few
seconds of compressed speech.
2) The exact content of the state depends on the version of Speex, the
compile options, and possibly the platform/compiler, so it could be a
compatibility nightmare.
3) The server (if my assumptions above are right) would need to decode
all the streams to keep the state.
4) There may be better ways of solving the problem.
> But hey, it already works (kind of) without sending the state accross at
> all, so maybe the whole state is not needed and only a part of it could
> be send at the cost of some audio artefact. The matter here is to
> balance the annoyance caused by the interruption of the audio flow, if I
> send the whole state, and the annoyance of the audio artefacts if I
> don't synchronise any state.
>
> I am ready to dive into the speex source code to do that, but I am sure
> I could use some of your thoughts on the problem. Also I would
> appreciate if somebody could point me towards the structure(s) in the
> source code in which the decoder state(s) are stored as it would save me
> quite a bit of time.
Here are some approaches I would consider (some can be combined). Which
one works best would depend on your exact problem:
1) If the time of the "switch" is known by the encoder, then the encoder
can simply reset its state to zero, so the state needs not be
transmitted (all the decoder needs to do is set its state to zero as well).
2) Even if the encoder cannot be reset (because it doesn't know when),
resetting the decoder will help not getting a residual signal from the
previous stream. Perhaps a bit of smoothing in the transition could also
help.
3) It's always best to switch in the middle of a silence portion because
then the state is close to zero.
4) There is a tuning (SPEEX_SET_PLC_TUNING) that controls how much Speex
relies on the prediction (you can trade a bit of quality for less
prediction).
5) It may actually be possible for the server (assuming model above) to
decode the first packet when the stream is switched and then re-encode
it with a null state. This would make the decoder state "nearly" in sync
with the original decoder. It's not simple because of the "lookahead", I
think it can be done.
> PS: I would have search a bit in the archive of the mailing list if only
> there was a tool for it. If there was already realted topic there you
> can remember of, tell me so I can have a read without asking stupid
> questions.
Google is your friend :-) And BTW, this is far from a stupid question
and I suspect it's actually an interesting research topic (and takers?).
Jean-Marc
More information about the Speex-dev
mailing list