[Flac-dev] CD TOC hash, WAS: Large compression test

Sat Aug 4 10:44:27 PDT 2001

--- Svante Eriksson <ser at as9-6-1.mt.g.bonet.se> wrote:
> "JC" == Josh Coalson <xflac at yahoo.com> writes:
> 
> JC> P.S. My next project is to rip and encode all my CDs and
> JC> store the CD metadata in a database.  I've got a nice
> JC> schema worked out and a better hash than CDindex for
> JC> creating a primary key from the CD TOC.  If there's
> JC> interest I can publish the code for the little TOC
> JC> reader + key generator (UNIX only, gotta love ioctl).
>
> Your TOC-hash algorithm would be interesting to examine, as
> I'm also intending to move the metadata into a database from
> a set of plain files.

All methods basically form a message from the CD table
of contents, then pass it through a hash function to get
a digest.  CDDB has a high chance of collision because
their hash function doesn't use most of the data in the
TOC and wastes several bits of the digest.  The effective
digest length is usually around 26 bits.

So I thought it would be better to use the whole TOC and
pass it through SHA1 which yields a 160 bit digest.  Then
I found cdindex (http://www.cdindex.org/disc.html) which
does just that.  But the way the message is formed makes
it unnecessarily long (804 bytes).  Plus they feed the
digest through a pseudo-base64 encoding.

So the one I use forms a message from the binary contents
of the TOC, and also uses some data cdindex doesn't (like
the track type), so the message size is just 3 bytes per
track.  From my understanding this should reduce the
chance of collisions even further but maybe by that time
it's a moot point.

One thing you should do is store the whole CD TOC in your
database.  That way you should be able to generate any
index you need, including CDDB and cdindex.

Josh

__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/