From owner-freebsd-fs  Thu Apr 24 13:40:10 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id NAA29788
          for fs-outgoing; Thu, 24 Apr 1997 13:40:10 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA29717;
          Thu, 24 Apr 1997 13:39:53 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id WAA01422;
	Thu, 24 Apr 1997 22:38:55 +0200 (CEST)
To: dg@root.com
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 13:18:33 PDT."
             <199704242018.NAA10732@root.com> 
Date: Thu, 24 Apr 1997 22:38:54 +0200
Message-ID: <1420.861914334@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <199704242018.NAA10732@root.com>, David Greenman writes:
>>In message <199704241934.MAA10488@root.com>, David Greenman writes:
>>>>about 1/4th of the additions. Getting reed of the div wouldn't also be bad
>>>>but might not be worth it.
>>>
>>>   Not worth it? At greater than 50 cycles, the divide is more expensive
>>>than the entire add loop.
>>
>>But loosing the hash entirely is >even< faster, right ?  :-)
>
>   Maybe. Here's the problem with your approach as I see it: On wcarchive,
>we have about 44,000 files in the namei cache. The hash table size is
>approximately 44,000 entries large (rounded up or down to the nearest

Unlikely.  You have a 44000 entry cache, but as far as I've been able to
make out you only have about 40000 >valid< entries in it (if it's anything
like the data I have here).  I'm trying to get the last 10% back.

>prime...I forget). This means that there is an average of about 1 file in
>each hash bucket...or very little to search through. 

Unfortunately not so, finding a hash that good would be really, really,
hard.  But I agree that we're in the O(3) land.

>each hash bucket...or very little to search through. Of course there are
>far fewer than 1 directory per file, so now if you change this so that all
>of the files for a directory are on one list, then you're going to have to do
>a lot more work to find it. I think the average directory length on wcarchive
>is about 50 files or so, so this would be about 50 times more entries in the
>list compared to the old hash table scheme.

This is what I thought, but appearantly not so.  Part of the problem seems
to be that there are multiple names pointing at the same directory ("foo"
 + N * "..") and that depletes your name-cache.  With the current design
it should probably be 1.5 .. 2 times bigger than desiredvnodes.

I'm very reluctant to increase it, when entries cost 64 bytes each, and
since data seems to indicate that 10% is stale (but we don't know how to
find them), so we keep recycling valid entries instead.

Another thing that bothers me is the size.  The reason for the current 
size of 64 is the way malloc works.  In reality we would get very close 
to the same efficiency from 48 bytes per entry.  I may look into changing 
the way we allocate them.  It would buy us 33% more entries in the same 
space.

David:
	Can I get you to try one (mostly harmless) experiment on 
	wcarchive ?  In vfs_cache.c, where it checks the number of 
	namecache entries against "desiredvnodes" could you try to use 
	2*desiredvnodes (or something similar) instead, and measure 
	the difference in cache hit rate ?

Poul-Henning

PS: I guess that I'm doing "the bad thing" here, I'm changing two 
variables at the same time:

	1. changing linkage of namecache entries to permit complete removal
		(This increases the efficiency of the cache)

	2. dropping hash algorithm.
		(This may make the cache faster/slower depending on POM)

But don't worry, until I have understood the impact, I'm not committing
anything :-)

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.