[ogg-dev] IceShare: IceT page hashing

Wed Nov 9 23:40:53 PST 2005

This is intended for those interested in IceShare development..

The earlier drafts of the IceT protocol (sorry, not back on the wiki 
yet) used SHA1 hashes for pages.  This, AFAIR, is the same that 
bittorrent uses for .torrent files, except ours are transfered "live" 
between the peer and the tracker as the data is being transfered.

The first method, where the Tracker sent a seperate "go get this page# 
from this URL" message, the entire SHA1 hash was provided with that 
statement.

The second method, where the Tracker sent a range of pages to fetch from 
a host (possibly grabbing every second, third, etc page) a SHA1 hash was 
provided for each page.  This obviously limited the range.

In a third method, the SHA1 hashes were combined, end to end, and then 
hashed again to provide a single SHA1 hash for the entire range.  The 
problem with this is that if an incomplete transfer was made, none of 
the hash could be verified without some additional communication between 
the peer and tracker, and also, this can put a high computational 
requirement on the tracker's end as these "hash of hashes" would need to 
be computed for each peer in realtime.

In the fourth method, the hash reporting was put on the client's end, as 
part of it's verification that it received the page, and the hash of 
hashes method was abandoned.

In a fifth method was developed after a discussion I had with Michael 
Richardson at HOPE5 (summer '04), a crypto guru who was more than happy 
to share his advice for this project..

Instead of sending the entire 160 bit hash for each page, only a 
variable amount of the hash for each page in a range was given.  This 
could be detirmined by the tracker, and because a malicious peer (ie, 
one trying to introduce advertising into another person's music stream 
by feeding altered audio content to others) would have no way of knowing 
which part of the hash would be tested, or how much, for any peer (and 
this is likely to be different for each peer), so as a whole it's likely 
at least one of the peers receiving mangled content would report it as 
bad and, from there, the tracker could report the questionable pages as 
"bad" to peers and continue from there.

Then it was reported that SHA1 had been cracked, signifigantly weakening 
it, as MD5 had been some time ago.

This is where I stopped work on the hashing, as I didn't have another 
function to use and didn't want to continue using a method which has 
already been compromised.

Now - SHA2 is interesting.  We can use much larger hash sizes, 1024bit 
even, and apply the same method as in the latest method (#5) so not all 
of the 1kbit (128byte) hash would need to be transfered.  Even if it 
gets cracked, as SHA1 did, it should only mean being cryptographically 
weakened by a certain amount, which would still end up being larger than 
the 160bit SHA1 we originally started using.

In theory, a random hash chunk as small as 32bit could be considered 
sufficient for each page, which with the average Ogg page this means 
4bytes vs 4kbytes of transfer when compared to the payload (1/1000th is 
a very good ratio).  If 32-bits is used as a fixed number, an offset of 
0 to 124 (or 128 w/ wrap) could be provided in one byte-field, followed 
by up to, say, 100 page hashes and still keep the message size under 500 
bytes total, the rest of the message being about 34bytes plus the length 
of the local media path (max 64 bytes).

Note that, even if an attacker found an alternative set of data which 
matched the same identical hash, that page would ALSO need to not only 
have a valid Ogg header but match the CRC32 checksum contained within 
the page itself, any of these conditions being untrue results in the 
page being rejected and the sending peer reported to the tracker.

Does anyone with more knowledge than myself of cryptographic hash 
functions know of a reason we shouldn't use SHA2 1kbit (one-time hashed 
when media is uploaded) or, prehaps, if another hash function would be 
more suited?  

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
 by Eben Moglen, General council of the Free Software Foundation