[ogg-dev] IceShare: IceT page hashing
Arc
arc at Xiph.org
Wed Nov 9 23:40:53 PST 2005
This is intended for those interested in IceShare development..
The earlier drafts of the IceT protocol (sorry, not back on the wiki
yet) used SHA1 hashes for pages. This, AFAIR, is the same that
bittorrent uses for .torrent files, except ours are transfered "live"
between the peer and the tracker as the data is being transfered.
The first method, where the Tracker sent a seperate "go get this page#
from this URL" message, the entire SHA1 hash was provided with that
statement.
The second method, where the Tracker sent a range of pages to fetch from
a host (possibly grabbing every second, third, etc page) a SHA1 hash was
provided for each page. This obviously limited the range.
In a third method, the SHA1 hashes were combined, end to end, and then
hashed again to provide a single SHA1 hash for the entire range. The
problem with this is that if an incomplete transfer was made, none of
the hash could be verified without some additional communication between
the peer and tracker, and also, this can put a high computational
requirement on the tracker's end as these "hash of hashes" would need to
be computed for each peer in realtime.
In the fourth method, the hash reporting was put on the client's end, as
part of it's verification that it received the page, and the hash of
hashes method was abandoned.
In a fifth method was developed after a discussion I had with Michael
Richardson at HOPE5 (summer '04), a crypto guru who was more than happy
to share his advice for this project..
Instead of sending the entire 160 bit hash for each page, only a
variable amount of the hash for each page in a range was given. This
could be detirmined by the tracker, and because a malicious peer (ie,
one trying to introduce advertising into another person's music stream
by feeding altered audio content to others) would have no way of knowing
which part of the hash would be tested, or how much, for any peer (and
this is likely to be different for each peer), so as a whole it's likely
at least one of the peers receiving mangled content would report it as
bad and, from there, the tracker could report the questionable pages as
"bad" to peers and continue from there.
Then it was reported that SHA1 had been cracked, signifigantly weakening
it, as MD5 had been some time ago.
This is where I stopped work on the hashing, as I didn't have another
function to use and didn't want to continue using a method which has
already been compromised.
Now - SHA2 is interesting. We can use much larger hash sizes, 1024bit
even, and apply the same method as in the latest method (#5) so not all
of the 1kbit (128byte) hash would need to be transfered. Even if it
gets cracked, as SHA1 did, it should only mean being cryptographically
weakened by a certain amount, which would still end up being larger than
the 160bit SHA1 we originally started using.
In theory, a random hash chunk as small as 32bit could be considered
sufficient for each page, which with the average Ogg page this means
4bytes vs 4kbytes of transfer when compared to the payload (1/1000th is
a very good ratio). If 32-bits is used as a fixed number, an offset of
0 to 124 (or 128 w/ wrap) could be provided in one byte-field, followed
by up to, say, 100 page hashes and still keep the message size under 500
bytes total, the rest of the message being about 34bytes plus the length
of the local media path (max 64 bytes).
Note that, even if an attacker found an alternative set of data which
matched the same identical hash, that page would ALSO need to not only
have a valid Ogg header but match the CRC32 checksum contained within
the page itself, any of these conditions being untrue results in the
page being rejected and the sending peer reported to the tracker.
Does anyone with more knowledge than myself of cryptographic hash
functions know of a reason we shouldn't use SHA2 1kbit (one-time hashed
when media is uploaded) or, prehaps, if another hash function would be
more suited?
--
The recognition of individual possibility,
to allow each to be what she and he can be,
rests inherently upon the availability of knowledge;
The perpetuation of ignorance is the beginning of slavery.
from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
by Eben Moglen, General council of the Free Software Foundation
More information about the ogg-dev
mailing list