Date: Mon, 27 Sep 2004 08:45:11 -0600 From: "David G. Andersen" <danderse@cs.utah.edu> To: Giorgos Keramidas <keramida@freebsd.org> Cc: Colin Percival <cperciva@wadham.ox.ac.uk> Subject: Re: compare-by-hash (was Re: sharing /etc/passwd) Message-ID: <20040927084511.E75411@cs.utah.edu> In-Reply-To: <20040927091710.GC914@orion.daedalusnetworks.priv>; from keramida@freebsd.org on Mon, Sep 27, 2004 at 12:17:10PM %2B0300 References: <Pine.LNX.4.33.0111071900280.24824-100000@moroni.pp.asu.edu> <20011107211316.A7830@nomad.lets.net> <20040925140242.GB78219@gothmog.gr> <41575DFC.9020206@wadham.ox.ac.uk> <20040927091710.GC914@orion.daedalusnetworks.priv>
next in thread | previous in thread | raw e-mail | index | archive | help
Giorgos Keramidas just mooed: > > What I pointed out was that when a non-zero possibility of two data > blocks comparing as equal (even though they are no) exists, the method > in question should not be used for password data or other sensitive bits > of information. A larger hash key will never yield a possibility of > zero, so it doesn't mean that you can sleep untroubled at night while > the rsync server overwrites /etc/*pwd.db files periodically. P(hash collision) << p(gamma ray memory bit-flip) also << p(disk block error) also << p(undetected network error w/TCP) You're worried about the wrong thing. Unless you're talking about malicious hash collision generation with a broken hash function, the random hash collision probability, particularly with something like sha-1, really is small enough as to be insignificant. The section in the paper dealing with this is pure bunk. _everything_ is a probability -- it's just a matter of how much you're willing to spend (bandwidth, computation, storage) to drive that probability low. Illustrative example from the paper: "The empirically observed rate of undetected errors in TCP packets is about 0.0000005% ... or we could slightly worsen that rate by sending only the hash" What's the error rate when sending only the hash? Since the probabilities are small, we can effectively add them. P(undetected TCP error) = 0.000000005 P(hash collision) = 1/1208925819614629174706176 =~ 0.00000000000000000000001 "Worsening" = 0.00000000500000000000001 Now, if I were a smart programmer, I'd look at that and say, "If I'm worried about reliability, then TCP is my enemy. Hashes are my friend -- because I can send _two_ different hashes of the same data for _way_ less then the cost of sending the data, and that way I can protect myself against undetected TCP errors!" -dave -- work: dga@lcs.mit.edu me: dga@pobox.com MIT Laboratory for Computer Science http://www.angio.net/ I do not accept unsolicited commercial email. Do not spam me.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040927084511.E75411>