[vorbis] Obtaining tag-independent track uniqueness?

Ross Levis ral at baycom.co.nz
Mon Mar 11 18:10:40 PST 2002



This is an interesting question which I can't fully answer.

I presume this technique is used for MP3 files in some file sharing apps
such as Morpheus & Audiogalaxy.  A similar routine will have to be
developed for Ogg files.

Something like this will not be too difficult.  I think all that is
needed is to omit all page headers and comments.  We would need to hear
from the ogg file format experts out there to confirm this.  I doubt it
is possible with existing library functions.

Regards,
Ross Levis.

Tom Wadzinski wrote:
> Hello:
> 
> As seen in some of the MP3-oriented P2P programs and audio organizing
> tools, the underlying uniqueness of a given mp3 file can be 
> learned (for
> the most part) by, for instance, taking a hash of the first 300,000
> bytes of the non-id3 tag content of an mp3 file to obtain a content
> signature (This hash could then further be paired with the 
> length of the
> non tag portion of the entire file for an even more unique signature).
> This signature could then be stored in an organizer program 
> DB (or in a
> p2p system DB) such that even though the filenames and tag content can
> change or be from different sources, the underlying audio 
> content can be
> tied back to a DB entry via the signature.  Note that this scheme is
> understood to not work for identifying identical content encoded under
> different bitrate/quality settings.
> 
> Can anyone guide me on whether or not there any way to accomplish the
> same goal with Vorbis using the existing APIs, that is, getting at the
> first x bytes of non-tagging/metadata content of a stream, and
> similarly, getting the length of the non-tagging/metadata 
> portion of an
> entire file stream?  Or, if not that, any ideas on obtaining
> "uniqueness" through another means in Vorbis?
> 
> One might say, "Why not just put a unique identifier in a tag in each
> file, and not worry about this hash business?"  To 
> preemptively respond
> to this, arguments against this approach follow:
> 1) The DB program (organizer or p2p system) might not have 
> write access
> to the files, and thus can't set an identifier tag.  For 
> instance, users
> with large collections (let's call large 20 - 30,000 files) are likely
> to have a good portion of it set to read-only(not to mention read-only
> media), for archival purposes.  Also, large collection 
> holders probably
> have a specific tagging/metadata program that they trust, and 
> don't want
> a program that they just downloaded deciding to write to every single
> one of their content files.
> 2) Files can't be checked for underlying audio content duplication,
> other than through tagging / file size methods, which is generally
> inadequate, due to different tagging/filename schemes.
> 
> Another might say, "How about decoding the first x seconds, 
> and taking a
> hash of that, to get uniqueness?".  This could work, except that
> different decoder implementations/versions might produce different
> hashes for the same file, and decoding is likely to be a much slower
> technique.
> 
> Tom Wadzinski
> 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 
> 'vorbis-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No 
> subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
> 

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list