I've been working on improving packet loss concealment in my VoIP client -- I hadn't actually realized speex performed some PLC until I was looking at nb_decode().<br><br>There are two things I'd like to do, and I think both will require some modification to nb_decode() and nb_decode_lost().<br>
<br>- I'd like to have my frames overlap by some number of samples. This is intended to reduce the amount of fill/stretch that has to happen when a packet is missing. This can probably be an entire subframe.<br>- I'd like to have nb_decode_lost() use information about future packets that may be waiting, if they are available (so that the signal blends into, and stretches out, the next packet's signal to fill the gap, rather than just extending the last packet and then transitioning to the next packet when decoding that packet)<br>
<br>Before I dive into this, I thought I'd do a sanity check with this list.<br><br>I suspect getting frame overlap to work internally to Speex will not be too difficult. If I shift both the current subframe's excitation and the excitation buffer from the last subframe by N samples (where N is overlap) and start the IIR filter N samples into the first subframe, I think this will work smoothly? I'm assuming that the discarded part of the excitation is roughly similar to the last segment of the previous subframe's excitation, because they were generated from the same raw samples? If I overlap by an entire subframe this all gets a lot easier I suppose. Maybe that's what I should do.<br>
<br>Filling a single subframe gap where you have data for both sides of the gap will be more difficult. My idea had been to just average the old and new excitations and interpolate the LPC parameters over the gap. This seems like maybe a bad idea if you end up interpolating a vowel excitation with consonant excitation - it seems like maybe interpolating excitations is not going to produce good results, in general. Perhaps the thing to do is to use only the adaptive codebook excitation and interpolate LPC parameters? It may also be the case that this really doesn't improve audio quality much versus the current nb_decode_lost() implementation (all that would change is that the LPC parameters and pitch gains are interpolated with the next packet, instead of just duplicating the previous subframe)<br>
<br>Finally, I wonder if there's a way to do time stretching in a coherent way inside of speex, or if this needs to happen to the output signal as a post-processing step. It seems like there's not good way to extend the codebook excitation signal in time? But I don't understand where the codebook comes from in the first place, so maybe it's possible to regenerate longer versions of each excitation signal?<br>
<br>Stuart<br>