[theora-dev] Bitstream encoded huffman tables always the same

Fri Dec 10 08:05:25 PST 2010

On Fri, Dec 10, 2010 at 10:51 AM, Gabriel TEIXEIRA
<gabriel_teixeira at sdesigns.eu> wrote:
> Hello all,
>
> I've been working a little inside the Theora decoder when I found that
> it seems that many videos had the very same huffman tables encoded into
> their bitstreams (at least the ones that I could take my time to
> dissecate). I found that the tables are listed as TH_VP31_HUFF_CODES in
> the file huffenc.c. I tried to investigate a little bit more to see who
> was setting the bitstream to those tables, but I ended in the fact that
> this is dependent whether the function th_encode_ctl is called with
> TH_ENCCTL_SET_HUFFMAN_CODES or TH_ENCCTL_SET_VP3_COMPATIBLE, but I could
> figure myself who is calling it using what parameter. Is that true that
> libtheora will always set the huffman codes to the same ones? Isn't this
> approach a little bit inefficient since the distribution of the
> probabilities of the symbols is not always the ones in the 80 tables
> (although the tables may be very good, there's always some room for
> increased precision), and besides we spend around 1-2kb to stock them,
> instead of having them preencoded in the decoder (of course, this would
> break the compatibility)?

When the encoder begins it has no idea what they'll be— so it must use
stock ones of some kind.

There is a tool in the theora-exp branch at
http://svn.xiph.org/trunk/theora-exp/examples/rehuff.c which will
losslessly optimize the huffman tables. The tables it produces aren't
optimal— the frame clustering for table assignment is non-trivial (the
tool also should be updated— I found the specific approach it uses to
be a bit pessimal) though it almost always makes files somewhat
smaller.