From owner-freebsd-fs@FreeBSD.ORG Fri Nov 28 03:07:30 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8DBC216A4CE for ; Fri, 28 Nov 2003 03:07:30 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1653B43FE3 for ; Fri, 28 Nov 2003 03:07:29 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id hASB7BeF017371; Fri, 28 Nov 2003 03:07:15 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200311281107.hASB7BeF017371@gw.catspoiler.org> Date: Fri, 28 Nov 2003 03:07:11 -0800 (PST) From: Don Lewis To: kmarx@vicor.com In-Reply-To: <3FC27F98.8090801@vicor.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-fs@FreeBSD.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Nov 2003 11:07:30 -0000 On 24 Nov, Ken Marx wrote: > > > Don Lewis wrote: >> Index: sys/kern/vfs_bio.c >> =================================================================== >> RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v >> retrieving revision 1.242.2.21 >> diff -u -r1.242.2.21 vfs_bio.c >> --- sys/kern/vfs_bio.c 9 Aug 2003 16:21:19 -0000 1.242.2.21 >> +++ sys/kern/vfs_bio.c 18 Nov 2003 02:10:55 -0000 >> @@ -140,6 +140,7 @@ >> &bufreusecnt, 0, ""); >> >> static int bufhashmask; >> +static int bufhashshift; >> static LIST_HEAD(bufhashhdr, buf) *bufhashtbl, invalhash; >> struct bqueues bufqueues[BUFFER_QUEUES] = { { 0 } }; >> char *buf_wmesg = BUF_WMESG; >> @@ -160,7 +161,20 @@ >> struct bufhashhdr * >> bufhash(struct vnode *vnp, daddr_t bn) >> { >> - return(&bufhashtbl[(((uintptr_t)(vnp) >> 7) + (int)bn) & bufhashmask]); >> + u_int64_t hashkey64; >> + int hashkey; >> + >> + /* >> + * Fibonacci hash, see Knuth's >> + * _Art of Computer Programming, Volume 3 / Sorting and Searching_ >> + * >> + * We reduce the argument to 32 bits before doing the hash to >> + * avoid the need for a slow 64x64 multiply on 32 bit platforms. >> + */ >> + hashkey64 = (u_int64_t)(uintptr_t)vnp + (u_int64_t)bn; >> + hashkey = (((u_int32_t)(hashkey64 + (hashkey64 >> 32)) * 2654435769u) >> >> + bufhashshift) & bufhashmask; >> + return(&bufhashtbl[hashkey]); >> } >> >> /* >> @@ -319,8 +333,9 @@ >> bufhashinit(caddr_t vaddr) >> { >> /* first, make a null hash table */ >> + bufhashshift = 29; >> for (bufhashmask = 8; bufhashmask < nbuf / 4; bufhashmask <<= 1) >> - ; >> + bufhashshift--; >> bufhashtbl = (void *)vaddr; >> vaddr = vaddr + sizeof(*bufhashtbl) * bufhashmask; >> --bufhashmask; >> >> > > Well, I'm mildly beflummoxed - I tried to compare hashtable preformance > between all three known versions of the hashing - legacy power of 2, > the Vicor ^= hash, and Don's fibonacci hash. > > Running with > > minifree = max( 1, avgifree / 4 ); > minbfree = max( 1, avgbfree ); > > all perform about the same, with no performance problems all > the way up to 100% disk capacity (didn't test into reserved space). > > Looking at instrumentation to show freq and avg depth of the > hash buckets, everything seems very calm (mainly because > we're not hitting the linear searching very often, I'd presume). > > I can't explain why I seemlingly got performance problems > with similar (identical) minbfree code previously. > > So, out of spite, I went back to > > minbfree = max( 1, avgbfree/4 ); > > This does hit the hashtable harder for the legacy version > and not so much for either new flavor. Here are a few > samplings of calling my dump routine from the debugger. > "avgdepth" really means 'search depth' since we use > the depth reached after finding a bp in gbincore. > > The line below such as, > > 0: avgdepth[1] cnt=801 > > means that 801 of the hashtable buckets had an avg search > depth of 1 at the time the debug routine was called. > The 'N:' prefix means the N-th unique non-zero such value. > So large cnt's for small []'d depth values means an efficient hash. > > I've edited out the details as much as possible. > > LEGACY: > -------- > Nov 24 13:34:54 oos0b /kernel: bh[442/0x1ba]: freq=2706110, avgdepth = 154 > ... > Nov 24 13:34:54 oos0b /kernel: 0: avgdepth[1] cnt=1015 > Nov 24 13:34:54 oos0b /kernel: 1: avgdepth[2] cnt=7 > Nov 24 13:34:54 oos0b /kernel: 2: avgdepth[154] cnt=1 <- !! > Nov 24 13:34:54 oos0b /kernel: 3: avgdepth[3] cnt=1 > ----------- > > Nov 24 13:36:49 oos0b /kernel: bh[442/0x1ba]: freq=3416953, avgdepth = 141 > ... > Nov 24 13:36:49 oos0b /kernel: 0: avgdepth[1] cnt=1017 > Nov 24 13:36:49 oos0b /kernel: 1: avgdepth[141] cnt=1 > Nov 24 13:36:49 oos0b /kernel: 2: avgdepth[2] cnt=6 > > VICOR x-or hashtable: > --------------------- > Nov 24 13:07:24 oos0b /kernel: 0: avgdepth[1] cnt=762 > Nov 24 13:07:24 oos0b /kernel: 1: avgdepth[2] cnt=259 > Nov 24 13:07:24 oos0b /kernel: 2: avgdepth[3] cnt=3 > ----------- > > Nov 24 13:08:07 oos0b /kernel: 0: avgdepth[1] cnt=744 > Nov 24 13:08:07 oos0b /kernel: 1: avgdepth[2] cnt=275 > Nov 24 13:08:07 oos0b /kernel: 2: avgdepth[3] cnt=5 > > FIBONACCI: > ---------- > Nov 24 11:56:50 oos0b /kernel: 0: avgdepth[1] cnt=811 > Nov 24 11:56:50 oos0b /kernel: 1: avgdepth[3] cnt=88 > Nov 24 11:56:50 oos0b /kernel: 2: avgdepth[2] cnt=124 > Nov 24 11:56:50 oos0b /kernel: 3: avgdepth[0] cnt=1 > ----------- > > Nov 24 11:57:48 oos0b /kernel: 0: avgdepth[1] cnt=801 > Nov 24 11:57:48 oos0b /kernel: 1: avgdepth[3] cnt=93 > Nov 24 11:57:48 oos0b /kernel: 2: avgdepth[2] cnt=130 > > So, while this is far from analytically eshaustive, > it almost appears the fibonacci hash has more entries > of depth 3, while the Vicor one has more at depth 2. > > I'm happy to run more tests if you have ideas. I'm also fine > to cut bait and go with whatever you decide. It *seems* like > putting the fibonacci hash is prudent since the current hash > has been observed to be expensive. I had trouble proving this > unequivocally though. So, perhaps Don's minbfree fix is sufficient > after all. I'm tempted at this point to go with the 100% flavor. I think we're running into one of the weaknesses in the Fibonacci hash. There are a large number of hash entries for the cylinder group blocks, which are located at offsets which are multiples of 89 * 2^10 in your example, or something on the order of 2^16. The effect of this is for the cylinder group number to be hashed using the least significant bits of the hash multiplier, which don't work as well for distributing the hash values. I tried some of Knuth's suggestions, and got better results with the hash multiplier 0x9E376DB1u. The most significant 16 bits of the multplier are the same as the original constant, and the least significant bits act as a fraction in the desirable range of 1/3 to 3/7. Please give this new hash multiplier a try.