From owner-freebsd-hackers  Thu Apr 24 21:52:19 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id VAA23649
          for hackers-outgoing; Thu, 24 Apr 1997 21:52:19 -0700 (PDT)
Received: from chai.plexuscom.com (chai.plexuscom.com [207.87.46.100])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA23641
          for <hackers@freebsd.org>; Thu, 24 Apr 1997 21:52:17 -0700 (PDT)
Received: from chai.plexuscom.com (localhost [127.0.0.1])
	by chai.plexuscom.com (8.8.5/8.8.5) with ESMTP id AAA15841;
	Fri, 25 Apr 1997 00:53:48 -0400 (EDT)
Message-Id: <199704250453.AAA15841@chai.plexuscom.com>
To: Poul-Henning Kamp <phk@dk.tfs.com>
Cc: hackers@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Wed, 23 Apr 1997 17:55:34 +0200."
             <1284.861810934@critter> 
Date: Fri, 25 Apr 1997 00:53:48 -0400
From: Bakul Shah <bakul@torrentnet.com>
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> 	We add a linked list to vnodes that represent directories
> 	and hook all the cache entires that are relative to that
> 	directory onto that list (sorted by name, or maybe LRU ?),
> 	rather than finding them through the hash list.

This is analogous to one of the ways one implements a symbol table
in a lexically scoped language processing program.  From experience,
even for a very few entries, a hash table is typically faster than
a linked list.  BTW, most all PostScript interpreters use one hash
table per dict object (and most of user-defined dicts are rather
small).  Unless you *need* sorted entries or bounded search time, a
dynamically growing hash table is almost always a win compared to
linear lists or trees.  If needed, LRU can be implemented with a ref
bit which is set every time an entry is accessed (and reset by a
separate scan).

Note that if you use one hash table per directory, you still gain
rest of the benefits you cited.

> Have I overlooked something fundamental ?

Scaling.  Directories with 100+ entries are not uncommon.  Even
/usr/include and /usr/include/sys have over 100 entries each.  I
once encountered a directory with 2,000,000+ entries!  One does not
optimize for such border cases but it is nice when they are handled
effortlessly as a byproduct of a design decision.

A dynamically growing hashtable is a must.  To avoid recomputing
hash you can store the entire 32 bit hash in each entry.  This
allows you to use nothing worse than a shift (or mod) operation when
growing the table.  The initial table size can be a function of the
directory size, perhaps bounded by some parameter.

A few years ago, in some usenet group, Chris Torek compared a number
of hash functions and with each one he hashed the entire web2 (or
some such) list of words to check collisions.  The best one from
this list goes something like

	unsigned hash = 0x31fe;	/* not sure of this init value */
	while (*cp)
		hash = hash*33 + *cp++;

It is easy to redo run the experiment over directories with a simple
user program that plugs in different hash functions.  Also note that
some hash functions be `unfolded' to handle 4 bytes at a time.

What would be nice is if the on-disk directories were stored as hash
tables.  This would have to be done in a compatible fashion, perhaps
by storing something that'll be ignored by `normal' directory lookup
routines.

-- bakul