[vorbis] Obtaining tag-independent track uniqueness?
Ross Levis
ral at baycom.co.nz
Mon Mar 11 18:10:40 PST 2002
This is an interesting question which I can't fully answer.
I presume this technique is used for MP3 files in some file sharing apps
such as Morpheus & Audiogalaxy. A similar routine will have to be
developed for Ogg files.
Something like this will not be too difficult. I think all that is
needed is to omit all page headers and comments. We would need to hear
from the ogg file format experts out there to confirm this. I doubt it
is possible with existing library functions.
Regards,
Ross Levis.
Tom Wadzinski wrote:
> Hello:
>
> As seen in some of the MP3-oriented P2P programs and audio organizing
> tools, the underlying uniqueness of a given mp3 file can be
> learned (for
> the most part) by, for instance, taking a hash of the first 300,000
> bytes of the non-id3 tag content of an mp3 file to obtain a content
> signature (This hash could then further be paired with the
> length of the
> non tag portion of the entire file for an even more unique signature).
> This signature could then be stored in an organizer program
> DB (or in a
> p2p system DB) such that even though the filenames and tag content can
> change or be from different sources, the underlying audio
> content can be
> tied back to a DB entry via the signature. Note that this scheme is
> understood to not work for identifying identical content encoded under
> different bitrate/quality settings.
>
> Can anyone guide me on whether or not there any way to accomplish the
> same goal with Vorbis using the existing APIs, that is, getting at the
> first x bytes of non-tagging/metadata content of a stream, and
> similarly, getting the length of the non-tagging/metadata
> portion of an
> entire file stream? Or, if not that, any ideas on obtaining
> "uniqueness" through another means in Vorbis?
>
> One might say, "Why not just put a unique identifier in a tag in each
> file, and not worry about this hash business?" To
> preemptively respond
> to this, arguments against this approach follow:
> 1) The DB program (organizer or p2p system) might not have
> write access
> to the files, and thus can't set an identifier tag. For
> instance, users
> with large collections (let's call large 20 - 30,000 files) are likely
> to have a good portion of it set to read-only(not to mention read-only
> media), for archival purposes. Also, large collection
> holders probably
> have a specific tagging/metadata program that they trust, and
> don't want
> a program that they just downloaded deciding to write to every single
> one of their content files.
> 2) Files can't be checked for underlying audio content duplication,
> other than through tagging / file size methods, which is generally
> inadequate, due to different tagging/filename schemes.
>
> Another might say, "How about decoding the first x seconds,
> and taking a
> hash of that, to get uniqueness?". This could work, except that
> different decoder implementations/versions might produce different
> hashes for the same file, and decoding is likely to be a much slower
> technique.
>
> Tom Wadzinski
>
>
> --- >8 ----
> List archives: http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
> 'vorbis-request at xiph.org'
> containing only the word 'unsubscribe' in the body. No
> subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
>
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis
mailing list