[Speex-dev] Packing multiple frames in a RTP packet

Manish Jalan jalanmanish at gmail.com
Wed Dec 9 20:54:18 PST 2009


Hello,

*Background:*
The RFC 5574 suggests the RTP payload format for the speex codec. The
payload formation is straight forward; the encoded frames are to be
concatenated one after another. Once we have appended desired number of
frames, we have to pad the stream with 01111 sort of sequence to ensure that
payload ends on a octet boundary.

*Observation:*
I am using the speex encoder at 2150 Kbps (by setting the quality to 0).
For a frame of 20 ms ~ 160 samples (considering 8000 samples per second as
the sampling rate), the encoder is giving me encoded output of 6 bytes.
As a test case, I encoded some 10 frames one after another each time getting
6 bytes of encoded output. I concatenated each of the 6 byte encoded
outputs.

As suggested in couple of posts I tried to decode this stream of encoded
voice by calling the decoder repeatedly until the bits remaining api
returned me a value less than 1.

What I observed was this sequence: First time the decoder returned
successful decode; Second time it returned end of stream; thrid time it
returned successful decode; fourth time it returned end of stream; ...

That is: decode success, EoS, decode success, EoS, decode success, EoS, ....

*Hypothesis:*
Based on the above observation, what might be happening is:
For a frame of 20 ms (=> 50 frames in a second), the encoder (running at
2150 bps) computes 43 bits of encoded stream. Since it has to return in
terms of full bytes, it pads 01111 sequence to give a 48 bit output.
Now while decoding 43 bits are first decoded; Then 01111 sequence is
interpreted as end of stream; Then next 43 bits  are decoded and 01111 is
interpreted as end of stream and so on.

*Query:*
For Speex, when we are packing multiple encoded frames in the RTP packet,
should we,
a. pack the encoded frame in full bytes as received from the encoder (i.e.
48 bits)
   or
b. we should be chopping the end of stream marker 0 followed by 1's (i.e.
strictly 43 bits) and have the 0 followed by 1's sequence used only for
padding the payload to ensure octet boundary.

*Reason for the query:*
I want to implement the RTP packetization that is interoperable. If the
receiver is not in my control, it should still be able to decode the stream
that I am sending.


Regards,
Manish S. Jalan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20091210/77fa0acd/attachment.htm 


More information about the Speex-dev mailing list