Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 30 Dec 2007 19:59:38 -0800
From:      perryh@pluto.rain.com
To:        aryeh.friedman@gmail.com, des@des.no
Cc:        freebsd-hackers@freebsd.org, ivoras@freebsd.org
Subject:   Re: BSD license compatible hash algorithm?
Message-ID:  <4778692a.rVZUrFy36ANLKL7F%perryh@pluto.rain.com>
In-Reply-To: <4778294C.3030905@gmail.com>
References:  <5950EE0C-383D-4D6B-9991-A0DEABD2ADE4@u.washington.edu> <20071228003716.GB48997@lor.one-eyed-alien.net> <B8D4C3C6-B867-4550-9F17-4DC6930D10E2@u.washington.edu> <fl2qiv$qoh$1@ger.gmane.org> <4774EF27.90307@gmail.com> <86r6h5zpr3.fsf@ds4.des.no> <4778294C.3030905@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
"Aryeh M. Friedman" <gmail.com!aryeh.friedman@agora.rdrop.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dag-Erling Sm??rgrav wrote:
> > "Aryeh M. Friedman" <aryeh.friedman@gmail.com> writes:
> >> All hashs have issues with pooling.... see
> >> http://www.burtleburtle.net/bob/hash/index.html... btw it is
> >> a old wives tale that the number of buckets should be prime
> >> (mostly based on the very weak implementation Knuth offered)
> >
> > Not an "old wives' tale", but rather an easy way to implement a
> > hash algorithm that is good enough for most simple uses: metric
> > modulo table size, where metric is a number derived from the
> > item in such a manner as to give a good spread.
>
> ... the above only applies if your using a very primitive hash
> like Knuth's multiplication one.... every modern hash I know of
> should have 2^k buckets actually for some k ...

It very much depends on what is used for a rehash (collision step)
value.  The step value and the table size must have no common factor
larger than 1, or there will be edge cases (bugs) in which some
combinations of hash and step values will cause the table to appear
full when in fact it is not.  Making the table size prime is one
simple way of preventing such problems, while still allowing the
rehash value to depend on the key (thus reducing the liklihood of
collision on the second probe).

At the other extreme, the table can be any size at all if the step
value is 1 (and the "modulo table size" operation will certainly be
more efficient if the table size is 2^k).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4778692a.rVZUrFy36ANLKL7F%perryh>