From owner-freebsd-hackers@FreeBSD.ORG Sun Dec 30 03:23:05 2007 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DD9816A41A for ; Sun, 30 Dec 2007 03:23:05 +0000 (UTC) (envelope-from eddy+public+spam@noc.everquick.net) Received: from a.mx.ict1.everquick.net (a.mx.everquick.net [204.10.191.21]) by mx1.freebsd.org (Postfix) with ESMTP id 34FB613C448 for ; Sun, 30 Dec 2007 03:23:05 +0000 (UTC) (envelope-from eddy+public+spam@noc.everquick.net) Received: from pop.ict1.everquick.net (localhost [127.0.0.1]) by a.mx.ict1.everquick.net (8.12.10/8.12.10) with ESMTP id lBU3Lcmp004064; Sun, 30 Dec 2007 03:21:38 GMT X-Everquick-No-Abuse-1: Report any email abuse to or X-Everquick-No-Abuse-2: call +1 (785) 865-5885. Please be sure to reference X-Everquick-No-Abuse-3: the Message-Id and include GMT timestamps. Received: from localhost (eddy@localhost) by pop.ict1.everquick.net (8.13.3/8.13.3/Submit) with ESMTP id lBU3LaYX004058; Sun, 30 Dec 2007 03:21:38 GMT X-Authentication-Warning: pop.ict1.everquick.net: eddy owned process doing -bs Date: Sun, 30 Dec 2007 03:21:35 +0000 (GMT) From: "Edward B. DREGER" X-X-Sender: eddy@pop.ict1.everquick.net To: Peter Jeremy In-Reply-To: <20071229213521.GW40785@server.vk2pj.dyndns.org> Message-ID: References: <5950EE0C-383D-4D6B-9991-A0DEABD2ADE4@u.washington.edu> <7F9D2F63-B5E6-41DE-843A-8D673C2DC88E@u.washington.edu> <20071229213521.GW40785@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: hackers@freebsd.org Subject: Re: BSD license compatible hash algorithm? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Dec 2007 03:23:05 -0000 PJ> Date: Sun, 30 Dec 2007 08:35:21 +1100 PJ> From: Peter Jeremy PJ> On Sat, Dec 29, 2007 at 08:50:14PM +0000, Edward B. DREGER wrote: PJ> > PJ> >perfect_hash = ( hash1[x] + hash2[x] ) % entry_count ; PJ> This relies on pre-knowledge of all possible entries. It's PJ> excellent for (eg) keyword lookups in a compiler (and gcc uses gperf PJ> for that reason) but no good where the input can be arbitrary. IIUC, Garrett wanted to eliminate duplicates. When an item is found, check against the current [OP]MPH table; either make new reference or follow existing, as appropriate. IIRC, CHM92 had to precompute the entire table. However, one can implement OPMPHashing in a way that supports dynamic deletions and incremental additions. Unfortunately, I know of no implementation that's either BSD-licensed or in C, let alone both. It's been nearly a couple years since I wrote a [proprietary] C++ implementation, but I recall that it wasn't terribly difficult. The trick for incremental additions is to deal with hash collisions, drastically reducing the probability of a graph cycle. Thus, one rarely must discard the entire OPMPH table in favor of one built from new hash functions. (Note that the OPMPH table must still be "large enough" to deal with the additional items. I forget the exact numbers, but space requirements were notably less than CHM92.) Eddy -- Everquick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita ________________________________________________________________________ DO NOT send mail to the following addresses: davidc@brics.com -*- jfconmaapaq@intc.net -*- sam@everquick.net Sending mail to spambait addresses is a great way to get blocked. Ditto for broken OOO autoresponders and foolish AV software backscatter.