From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 17 11:03:11 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7C66216A4CE for ; Wed, 17 Dec 2003 11:03:11 -0800 (PST) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 872FE43D50 for ; Wed, 17 Dec 2003 11:03:09 -0800 (PST) (envelope-from dwmalone@maths.tcd.ie) Received: from hamilton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 17 Dec 2003 19:03:08 +0000 (GMT) Date: Wed, 17 Dec 2003 19:03:07 +0000 From: David Malone To: Ted Unangst Message-ID: <20031217190307.GA43344@hamilton.maths.tcd.ie> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.3i Sender: dwmalone@maths.tcd.ie cc: freebsd-hackers@freebsd.org Subject: Re: patch: portable dirhash X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Dec 2003 19:03:11 -0000 On Wed, Dec 17, 2003 at 01:09:18PM -0500, Ted Unangst wrote: > while on the subject, there's a piece of code something like this in > freebsd: > /* > * We hash the name and then some other bit of data that is > * invariant over the dirhash's lifetime. Otherwise names > * differing only in the last byte are placed close to one > * another in the table, which is bad for linear probing. > */ > hash = hash32_buf(name, namelen, HASHINIT); > hash = hash32_buf(dh, sizeof(dh), hash); > > which isn't doing what you'd expect (hashing the dh pointer), instead it > hashes the first bytes of dh, conveniently a constant value so it works, > but it provides no benefit. fix is making it &dh. i'd provide a diff, > but it's a little large. :) Ahhh! Well spotted. Actually, it does provide the benefit, even though it it wasn't the value that I'd intended to hash here. Because of the way that the fnv hash works, turning the handle on the hash function one more time should usually split up similar filenames. I'll have a quick think about which one is really better to hash. I don't think there's an advantage to either in particular. If we were to start reallocating dh_hash without freeing dh, then we'd get a slightly different hash function each time, which might be a slight advantage. However, we don't do that right now, so... David.