[Speex-dev] Packing multiple frames in a RTP packet

Jean-Marc Valin jean-marc.valin at usherbrooke.ca
Thu Dec 10 03:52:52 PST 2009

You cannot concatenate bytes because Speex frames don't necessarily end
on octet boundaries. You need to call the encoder multiple times on the
same SpeexBits bitpacket.


Manish Jalan wrote:
> Hello,
> _*Background:*_
> The RFC 5574 suggests the RTP payload format for the speex codec. The
> payload formation is straight forward; the encoded frames are to be
> concatenated one after another. Once we have appended desired number of
> frames, we have to pad the stream with 01111 sort of sequence to ensure
> that payload ends on a octet boundary.
> _*Observation:*_
> I am using the speex encoder at 2150 Kbps (by setting the quality to 0).
> For a frame of 20 ms ~ 160 samples (considering 8000 samples per second
> as the sampling rate), the encoder is giving me encoded output of 6 bytes.
> As a test case, I encoded some 10 frames one after another each time
> getting 6 bytes of encoded output. I concatenated each of the 6 byte
> encoded outputs.
> As suggested in couple of posts I tried to decode this stream of encoded
> voice by calling the decoder repeatedly until the bits remaining api
> returned me a value less than 1.
> What I observed was this sequence: First time the decoder returned
> successful decode; Second time it returned end of stream; thrid time it
> returned successful decode; fourth time it returned end of stream; ...
> That is: decode success, EoS, decode success, EoS, decode success, EoS, ....
> _*Hypothesis:*_
> Based on the above observation, what might be happening is:
> For a frame of 20 ms (=> 50 frames in a second), the encoder (running at
> 2150 bps) computes 43 bits of encoded stream. Since it has to return in
> terms of full bytes, it pads 01111 sequence to give a 48 bit output.
> Now while decoding 43 bits are first decoded; Then 01111 sequence is
> interpreted as end of stream; Then next 43 bits  are decoded and 01111
> is interpreted as end of stream and so on.
> _*Query:*_
> For Speex, when we are packing multiple encoded frames in the RTP
> packet, should we,
> a. pack the encoded frame in full bytes as received from the encoder
> (i.e. 48 bits)
>    or
> b. we should be chopping the end of stream marker 0 followed by 1's
> (i.e. strictly 43 bits) and have the 0 followed by 1's sequence used
> only for padding the payload to ensure octet boundary.
> _*Reason for the query:*_
> I want to implement the RTP packetization that is interoperable. If the
> receiver is not in my control, it should still be able to decode the
> stream that I am sending.
> Regards,
> Manish S. Jalan
> ------------------------------------------------------------------------
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev

More information about the Speex-dev mailing list