[vorbis] Obtaining tag-independent track uniqueness?

Tom Wadzinski twadzins at yahoo.com
Mon Mar 11 17:43:20 PST 2002



Hello:

As seen in some of the MP3-oriented P2P programs and audio organizing
tools, the underlying uniqueness of a given mp3 file can be learned (for
the most part) by, for instance, taking a hash of the first 300,000
bytes of the non-id3 tag content of an mp3 file to obtain a content
signature (This hash could then further be paired with the length of the
non tag portion of the entire file for an even more unique signature).
This signature could then be stored in an organizer program DB (or in a
p2p system DB) such that even though the filenames and tag content can
change or be from different sources, the underlying audio content can be
tied back to a DB entry via the signature.  Note that this scheme is
understood to not work for identifying identical content encoded under
different bitrate/quality settings.

Can anyone guide me on whether or not there any way to accomplish the
same goal with Vorbis using the existing APIs, that is, getting at the
first x bytes of non-tagging/metadata content of a stream, and
similarly, getting the length of the non-tagging/metadata portion of an
entire file stream?  Or, if not that, any ideas on obtaining
"uniqueness" through another means in Vorbis?

One might say, "Why not just put a unique identifier in a tag in each
file, and not worry about this hash business?"  To preemptively respond
to this, arguments against this approach follow:
1) The DB program (organizer or p2p system) might not have write access
to the files, and thus can't set an identifier tag.  For instance, users
with large collections (let's call large 20 - 30,000 files) are likely
to have a good portion of it set to read-only(not to mention read-only
media), for archival purposes.  Also, large collection holders probably
have a specific tagging/metadata program that they trust, and don't want
a program that they just downloaded deciding to write to every single
one of their content files.
2) Files can't be checked for underlying audio content duplication,
other than through tagging / file size methods, which is generally
inadequate, due to different tagging/filename schemes.

Another might say, "How about decoding the first x seconds, and taking a
hash of that, to get uniqueness?".  This could work, except that
different decoder implementations/versions might produce different
hashes for the same file, and decoding is likely to be a much slower
technique.

Tom Wadzinski

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list