From owner-freebsd-fs Thu Apr 24 07:04:26 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id HAA07677 for fs-outgoing; Thu, 24 Apr 1997 07:04:26 -0700 (PDT) Received: from Sisyphos.MI.Uni-Koeln.DE (Sisyphos.MI.Uni-Koeln.DE [134.95.212.10]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id HAA07667 for ; Thu, 24 Apr 1997 07:04:20 -0700 (PDT) Received: from x14.mi.uni-koeln.de (annexr3-1.slip.Uni-Koeln.DE) by Sisyphos.MI.Uni-Koeln.DE with SMTP id AA01677 (5.67b/IDA-1.5 for ); Thu, 24 Apr 1997 16:03:47 +0200 Received: (from se@localhost) by x14.mi.uni-koeln.de (8.8.5/8.6.9) id QAA23373; Thu, 24 Apr 1997 16:03:39 +0200 (CEST) Message-Id: <19970424160338.57282@x14.mi.uni-koeln.de> Date: Thu, 24 Apr 1997 16:03:38 +0200 From: Stefan Esser To: Don Lewis Cc: dg@root.com, Poul-Henning Kamp , Michael Hancock , fs@freebsd.org Subject: Re: the namei cache... References: <199704241256.FAA11674@salsa.gv.tsc.tdk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.68 In-Reply-To: <199704241256.FAA11674@salsa.gv.tsc.tdk.com>; from Don Lewis on Thu, Apr 24, 1997 at 05:56:01AM -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, Don Lewis wrote: > Yes, most processors are really bad at integer divide. You might look at > the Perl hash. You multiply the current value by 33 and add the next > character, then mask off however many bits you want at the end (the table > size must be a power of 2). If your compiler is at all smart, it will > notice that the multiply is just a shift and an add. I suspect this might > be pretty easy to just drop in to the existing code for a quick trial. I used the following as a hash function, after looking up algorithms in Knuth's book and studying lots of examples in several software packages. I needed a fast hash, which should work on power of 2 size tables, with keys mostly being ASCII text. I did quite some profiling and testing with my data sets, and the following gave good results: #define SHVAL (8) #define MULTIPLIER (0x5425daf1) unsigned long hashf (unsigned long hashmax, void *key, size_t keylen) { unsigned long i = 0; unsigned char *p = key; while (keylen--) { unsigned char c = *p++; i -= (i << 4) ^ c; /* i -= (i << 4) + (i >> 23) ^ c;*/ } i *= MULTIPLIER; i >>= SHVAL; return i & hashmax; } This lead to significantly less collisions than other, simpler hash functions. I tried other versions that fed back some of the high order bits, but surprisingly found that this did not help reduce the number of collisions, and thus was not worth the additional shift and add. But this is possibly not true in the case of file names that tend to have a common suffix (.tar.gz, for example). (The value of hashmax must obviously be a power of 2 minus 1 ...) I used tables from a few thousand up to a few hundred thousand entries, BTW. Regards, STefan