From owner-freebsd-fs  Tue Apr 22 09:29:44 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id JAA23004
          for fs-outgoing; Tue, 22 Apr 1997 09:29:44 -0700 (PDT)
Received: from fssrv.fachschaften.tu-muenchen.de (fssrv.fachschaften.tu-muenchen.de [129.187.176.20])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id JAA22998
          for <freebsd-fs@freebsd.org>; Tue, 22 Apr 1997 09:29:40 -0700 (PDT)
Received: from boogie.fachschaften.tu-muenchen.de by fssrv.fachschaften.tu-muenchen.de with SMTP id AA05199
  (5.67a8/IDA-1.5 for <freebsd-fs@freebsd.org>); Tue, 22 Apr 1997 17:29:49 +0200
Received: by boogie.fachschaften.tu-muenchen.de (5.67a8/Ultrix3.0-C)
	id AA06445; Tue, 22 Apr 1997 18:29:36 +0200
Date: Tue, 22 Apr 1997 18:29:35 +0200 (MET)
From: "Reini (Reinhold Huber)" <Reinhold.Huber@Fachschaften.TU-Muenchen.DE (Reinhold Huber)>
To: freebsd-fs@freebsd.org
Subject: Recreating destroyed filesystem
Message-Id: <Pine.ULT.3.91.970422181716.6176E-100000@boogie.fachschaften.tu-muenchen.de>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hi!

My problem: I've screwed up my disk by typing
'dd if=/dev/zero of=/dev/rsd0c count=20', 
which was not at all healthy to the contens of my disk. I should really 
have typed 'sd1' instead of 'sd0' :(.
Now I've jumpered SCSI ID 0 to ID 1 and installed FreeBSD on another disk
(now SCSI ID 0 / sd0).
The manpages of dumpfs and fs tell me that I need to find a superblock.
The questions:

* Does the info the superblock contains allow to find the beginning of 
  the partition?
* How do I find the superblock?
* What is where in the superblock?

Aim is to get my user files back, and then using this disk for 
2.2[.1]-RELEASE.

I habe been using FreeBSD 2.1.0-RELEASE on the machine, installed by
the /stand/sysinstall utility, using the default partition layout
computed before I upgraded memory from 16 to 48 MB.

Any info, including hints on manpages, webpages, FAQs, ..., is valuable 
for me.

Thanks in advance,
Reinhold Huber


From owner-freebsd-fs  Wed Apr 23 08:57:07 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id IAA06489
          for fs-outgoing; Wed, 23 Apr 1997 08:57:07 -0700 (PDT)
Received: from critter.dk.tfs.com ([140.145.230.252])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id IAA06483
          for <fs@freebsd.org>; Wed, 23 Apr 1997 08:57:05 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id RAA01286
	for <fs@freebsd.org>; Wed, 23 Apr 1997 17:55:34 +0200 (CEST)
To: fs@freebsd.org
Subject: the namei cache...
From: Poul-Henning Kamp <phk@dk.tfs.com>
Date: Wed, 23 Apr 1997 17:55:34 +0200
Message-ID: <1284.861810934@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


I've been sticking my nose in the namei cache again, and that got me
wondering.

Consider this:

	We add a linked list to vnodes that represent directories
	and hook all the cache entires that are relative to that
	directory onto that list (sorted by name, or maybe LRU ?),
	rather than finding them through the hash list.

Benefit:
	We can release the cache entries at the time we recycle the
	vnode, instead of the lazy release when we stumble across
	an invalid cache entry, resulting in a better cache utilization.

Benefit:
	We can do away with the v_id field, unless other parts of
	the kernel need it.

Benefit:
	We don't rely on the hash to level things out for us.


Benefit:
	We can lock with finer and more logical granularity (ie. per vnode)
	in the cache than we could otherwise do.

Neutral:
	We still keep the LRU list around to reclain little used
	cache entries.

Almost neutral:
	We save the storage for the hash-table but the vnode becomes 4 
	bytes bigger.  This is close to a break even, since the size
	of the hash table is found from desiredvnodes...

Comments ?

Have I overlooked something fundamental ?

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com	    TRW Financial Systems, Inc.
Future will arrive by its own means, progress not so.

From owner-freebsd-fs  Thu Apr 24 03:13:56 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id DAA28568
          for fs-outgoing; Thu, 24 Apr 1997 03:13:56 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA28563
          for <fs@freebsd.org>; Thu, 24 Apr 1997 03:13:53 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id KAA13250; Thu, 24 Apr 1997 10:13:44 GMT
Date: Thu, 24 Apr 1997 19:13:44 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: fs@freebsd.org
Subject: Re: the namei cache...
In-Reply-To: <1284.861810934@critter>
Message-ID: <Pine.SV4.3.95.970424183951.12757D-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Wed, 23 Apr 1997, Poul-Henning Kamp wrote:

> 
> I've been sticking my nose in the namei cache again, and that got me
> wondering.

> Consider this:
> 
> 	We add a linked list to vnodes that represent directories
> 	and hook all the cache entires that are relative to that
> 	directory onto that list (sorted by name, or maybe LRU ?),
> 	rather than finding them through the hash list.


> Benefit:
> 	We can release the cache entries at the time we recycle the
> 	vnode, instead of the lazy release when we stumble across
> 	an invalid cache entry, resulting in a better cache utilization.

Why can't this be done now?
 
> Benefit:
> 	We can do away with the v_id field, unless other parts of
> 	the kernel need it.

That would be nice.  Our vnodes are pretty big, 112 bytes?

> Benefit:
> 	We don't rely on the hash to level things out for us.

What's wrong with relying on the hash?

> 
> Benefit:
> 	We can lock with finer and more logical granularity (ie. per vnode)
> 	in the cache than we could otherwise do.

It doesn't give you logical vnode granularity, but you could lock on
buckets.

> Neutral:
> 	We still keep the LRU list around to reclain little used
> 	cache entries.
> 
> Almost neutral:
> 	We save the storage for the hash-table but the vnode becomes 4 
> 	bytes bigger.  This is close to a break even, since the size
> 	of the hash table is found from desiredvnodes...
> 

I think hashing is still a better way of doing it.

Regards,


Mike Hancock


From owner-freebsd-fs  Thu Apr 24 03:27:56 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id DAA29106
          for fs-outgoing; Thu, 24 Apr 1997 03:27:56 -0700 (PDT)
Received: from critter.dk.tfs.com ([140.145.230.252])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA29070;
          Thu, 24 Apr 1997 03:27:47 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id MAA00738;
	Thu, 24 Apr 1997 12:26:21 +0200 (CEST)
To: Michael Hancock <michaelh@cet.co.jp>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 19:13:44 +0900."
             <Pine.SV4.3.95.970424183951.12757D-100000@parkplace.cet.co.jp> 
Date: Thu, 24 Apr 1997 12:26:21 +0200
Message-ID: <736.861877581@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


>> Benefit:
>> 	We can release the cache entries at the time we recycle the
>> 	vnode, instead of the lazy release when we stumble across
>> 	an invalid cache entry, resulting in a better cache utilization.
>
>Why can't this be done now?

we would have to traverse either the all the hash lists, or the LRU
lists to locate the entries.

>> Benefit:
>> 	We can do away with the v_id field, unless other parts of
>> 	the kernel need it.
>
>That would be nice.  Our vnodes are pretty big, 112 bytes?

+64 bytes in the namecache structure :-(

>> Benefit:
>> 	We don't rely on the hash to level things out for us.
>
>What's wrong with relying on the hash?

It's not needed.

>> Benefit:
>> 	We can lock with finer and more logical granularity (ie. per vnode)
>> 	in the cache than we could otherwise do.
>
>It doesn't give you logical vnode granularity, but you could lock on
>buckets.  Which will lock out some other random number of vnodes (ie
directories), where as the new scheme will lock only (the already locked
directory we're looking in.

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 03:39:22 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id DAA29626
          for fs-outgoing; Thu, 24 Apr 1997 03:39:22 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA29621
          for <fs@freebsd.org>; Thu, 24 Apr 1997 03:39:20 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id KAA13310; Thu, 24 Apr 1997 10:39:12 GMT
Date: Thu, 24 Apr 1997 19:39:12 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <736.861877581@critter>
Message-ID: <Pine.SV4.3.95.970424193241.13258B-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 24 Apr 1997, Poul-Henning Kamp wrote:

> >What's wrong with relying on the hash?
> 
> It's not needed.

How would the link list version fare under degenerate cases like large
directories?

With hashing you can work on the hashing algorithm.  Btw, what is the hash
key now?  vp+name[20]?

Mike


From owner-freebsd-fs  Thu Apr 24 03:44:02 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id DAA29815
          for fs-outgoing; Thu, 24 Apr 1997 03:44:02 -0700 (PDT)
Received: from critter.dk.tfs.com ([140.145.230.252])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA29781;
          Thu, 24 Apr 1997 03:42:49 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id MAA00803;
	Thu, 24 Apr 1997 12:41:18 +0200 (CEST)
To: Michael Hancock <michaelh@cet.co.jp>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 19:39:12 +0900."
             <Pine.SV4.3.95.970424193241.13258B-100000@parkplace.cet.co.jp> 
Date: Thu, 24 Apr 1997 12:41:18 +0200
Message-ID: <801.861878478@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <Pine.SV4.3.95.970424193241.13258B-100000@parkplace.cet.co.jp>, Mich
ael Hancock writes:
>On Thu, 24 Apr 1997, Poul-Henning Kamp wrote:
>
>> >What's wrong with relying on the hash?
>> 
>> It's not needed.
>
>How would the link list version fare under degenerate cases like large
>directories?

yes, for those directories.

>With hashing you can work on the hashing algorithm.  Btw, what is the hash
>key now?  vp+name[20]?

directory vnode v_id + the regular namei hash.

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 04:49:23 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id EAA02464
          for fs-outgoing; Thu, 24 Apr 1997 04:49:23 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA02459
          for <fs@freebsd.org>; Thu, 24 Apr 1997 04:49:21 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id LAA13640; Thu, 24 Apr 1997 11:49:14 GMT
Date: Thu, 24 Apr 1997 20:49:14 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <801.861878478@critter>
Message-ID: <Pine.SV4.3.95.970424202555.13551A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 24 Apr 1997, Poul-Henning Kamp wrote:

> >With hashing you can work on the hashing algorithm.  Btw, what is the hash
> >key now?  vp+name[20]?
> 
> directory vnode v_id + the regular namei hash.

It's hard to tell how well keys are distributed.  The cn_hash component
seems like it can be improved, maybe by taking only 3 or 4 of the
characters in the name and then doing some shifts and xor's on them. 

	cnp->cn_hash = 0;
	for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++)
		cnp->cn_hash += (unsigned char)*cp;
	cnp->cn_namelen = cp - cnp->cn_nameptr;
	if (cnp->cn_namelen > NAME_MAX) {
		error = ENAMETOOLONG;
		goto bad;
	}

Regarding the capability have you done any analysis on it's distribution?

Mike


From owner-freebsd-fs  Thu Apr 24 05:07:22 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id FAA03014
          for fs-outgoing; Thu, 24 Apr 1997 05:07:22 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id FAA03009
          for <fs@freebsd.org>; Thu, 24 Apr 1997 05:07:19 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id FAA09111; Thu, 24 Apr 1997 05:08:55 -0700 (PDT)
Message-Id: <199704241208.FAA09111@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: Michael Hancock <michaelh@cet.co.jp>, fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 12:41:18 +0200."
             <801.861878478@critter> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 05:08:55 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>>With hashing you can work on the hashing algorithm.  Btw, what is the hash
>>key now?  vp+name[20]?
>
>directory vnode v_id + the regular namei hash.

   Calculating the hash is expensive since it involves doing a byte sum of
all of the characters in the name, adding in the directory v_id, and dividing
by a prime number. There's got to be a better way. I haven't carefully read
what Poul-Henning is proposing, but the gist I got was that he intends to
hang the file name cache entries off of the directory vnode? It seems to
me that this works well for small directories and poorly for large
directories unless you construct a binary tree and keep the entries sorted.
Then the problem becomes managing the tree (keeping it balanced, etc), which
itself can be expensive...although lookups happen far more often than creates
and deletes.
   It's interesting that the single largest CPU consumer on wcarchive appears
to be the namei cache lookups. I think the hash algorithm needs to be
re-visited at the very least, perhaps changing the divide by a prime into
some sort of xor of the filename characters (xored in pairs as int16's).

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 05:30:20 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id FAA03876
          for fs-outgoing; Thu, 24 Apr 1997 05:30:20 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id FAA03867
          for <fs@freebsd.org>; Thu, 24 Apr 1997 05:30:18 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id MAA13844; Thu, 24 Apr 1997 12:30:02 GMT
Date: Thu, 24 Apr 1997 21:30:02 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: David Greenman <dg@root.com>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <199704241208.FAA09111@root.com>
Message-ID: <Pine.SV4.3.95.970424212315.13828A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 24 Apr 1997, David Greenman wrote:

>    It's interesting that the single largest CPU consumer on wcarchive appears
> to be the namei cache lookups. I think the hash algorithm needs to be
> re-visited at the very least, perhaps changing the divide by a prime into
> some sort of xor of the filename characters (xored in pairs as int16's).

I don't think you need to even look at all the characters.  You can take
the first and last and couple in the middle.  To get the middle ones,
instead of dividing the length by 2 do a bit shift on the length.  Then
xor the 2 pairs.  Any hash guru's on the list? 

Mike Hancock



From owner-freebsd-fs  Thu Apr 24 05:56:13 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id FAA04755
          for fs-outgoing; Thu, 24 Apr 1997 05:56:13 -0700 (PDT)
Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id FAA04750
          for <fs@freebsd.org>; Thu, 24 Apr 1997 05:56:10 -0700 (PDT)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
          by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP
	  id FAA20232; Thu, 24 Apr 1997 05:56:03 -0700 (PDT)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id FAA13671;
	Thu, 24 Apr 1997 05:56:03 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id FAA11674;
	Thu, 24 Apr 1997 05:56:01 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199704241256.FAA11674@salsa.gv.tsc.tdk.com>
Date: Thu, 24 Apr 1997 05:56:01 -0700
In-Reply-To: David Greenman <dg@root.com>
       "Re: the namei cache..." (Apr 24,  5:08am)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: dg@root.com, Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache...
Cc: Michael Hancock <michaelh@cet.co.jp>, fs@freebsd.org
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Apr 24,  5:08am, David Greenman wrote:
} Subject: Re: the namei cache...
} >>With hashing you can work on the hashing algorithm.  Btw, what is the hash
} >>key now?  vp+name[20]?
} >
} >directory vnode v_id + the regular namei hash.
} 
}    Calculating the hash is expensive since it involves doing a byte sum of
} all of the characters in the name, adding in the directory v_id, and dividing
} by a prime number.

It shouldn't be too bad because of the locality, so there aren't many cache
misses.

} There's got to be a better way. I haven't carefully read
} what Poul-Henning is proposing, but the gist I got was that he intends to
} hang the file name cache entries off of the directory vnode? It seems to
} me that this works well for small directories and poorly for large
} directories unless you construct a binary tree and keep the entries sorted.

Doing lots of pointer chasing is slow ...

} Then the problem becomes managing the tree (keeping it balanced, etc), which
} itself can be expensive...although lookups happen far more often than creates
} and deletes.

At the filesystem level there are far more lookups, but as uncached files
are looked up, other entries have to be kicked out of the cache, so you'll
end up with a much higher rate of addition and deletion of entries in the
cache than in the file system.

}    It's interesting that the single largest CPU consumer on wcarchive appears
} to be the namei cache lookups.

It might be interesting to know how long the hash chains are and how
well balanced they are.  Is the table size appropriate for for the
number of vnodes that are in use?

} I think the hash algorithm needs to be
} re-visited at the very least, perhaps changing the divide by a prime into
} some sort of xor of the filename characters (xored in pairs as int16's).

Yes, most processors are really bad at integer divide.  You might look at
the Perl hash.  You multiply the current value by 33 and add the next
character, then mask off however many bits you want at the end (the table
size must be a power of 2).  If your compiler is at all smart, it will
notice that the multiply is just a shift and an add.  I suspect this might
be pretty easy to just drop in to the existing code for a quick trial.

			---  Truck

From owner-freebsd-fs  Thu Apr 24 06:34:19 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id GAA06296
          for fs-outgoing; Thu, 24 Apr 1997 06:34:19 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06122;
          Thu, 24 Apr 1997 06:28:44 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id PAA00322;
	Thu, 24 Apr 1997 15:27:45 +0200 (CEST)
To: Michael Hancock <michaelh@cet.co.jp>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 20:49:14 +0900."
             <Pine.SV4.3.95.970424202555.13551A-100000@parkplace.cet.co.jp> 
Date: Thu, 24 Apr 1997 15:27:45 +0200
Message-ID: <320.861888465@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <Pine.SV4.3.95.970424202555.13551A-100000@parkplace.cet.co.jp>, Mich
ael Hancock writes:
>On Thu, 24 Apr 1997, Poul-Henning Kamp wrote:
>
>> >With hashing you can work on the hashing algorithm.  Btw, what is the hash
>> >key now?  vp+name[20]?
>> 
>> directory vnode v_id + the regular namei hash.
>
>It's hard to tell how well keys are distributed.  The cn_hash component
>seems like it can be improved, maybe by taking only 3 or 4 of the
>characters in the name and then doing some shifts and xor's on them. 

not terribly well according to my study, but well enough for this 
purpose.

>Regarding the capability have you done any analysis on it's distribution?

It's a global counter, it's only added to avoid all the ".." or "CVS"
entries ending up in the same hash-bucket.

You are on the wrong lead here.  The time or the quality of the hash
is not terribly important, considering that what we save is a diskaccess
or two.  What matters is using the limited (see below) number of cache
entries all the time.

The problem is that with the current implementation, we have stale entries
in the LRU chain, but they're not at the end where we happen to grab the
next one (this happens when a vnode gets recycled).  Therefore we will
purge good entries out to add new data with some regularity.

Linking to the vnodes will avoid this problem, since we can purge the
entries right away when we recycle the vnode.

I've actually implemented it this morning, if you want to try it out,
here is the patch.  You need to "sysctl -w debug.vfscache" to start
the caching, I've disabled it by default while I play.

Another thing, you can set the size of the name cache now with sysctl

	sysctl -w debug.vfscachelimit=3000

If the number is zero, the limit is "desiredvnodes", which as far as
I can see isn't enough for sane operation.  (think "." and ".." ...)
Remember that they cost 64bytes a piece btw.

I intend to run a couple of benchmarks on this during the next week or so.

Any data you can add will be most welcome.

Poul-Henning

PS:  You need some of the additions to <sys/queue.h> which I'm working
on, so that's included in the patch too.


Index: sys/namei.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/namei.h,v
retrieving revision 1.13
diff -u -r1.13 namei.h
--- namei.h	1997/02/22 09:45:38	1.13
+++ namei.h	1997/04/24 12:27:02
@@ -165,10 +165,9 @@
 #endif
 
 struct	namecache {
-	LIST_ENTRY(namecache) nc_hash;	/* hash chain */
-	TAILQ_ENTRY(namecache) nc_lru;	/* LRU chain */
-	struct	vnode *nc_dvp;		/* vnode of parent of name */
-	u_long	nc_dvpid;		/* capability number of nc_dvp */
+	LIST_ENTRY(namecache) nc_hash_src;	/* hash chain */
+	LIST_ENTRY(namecache) nc_hash_dst;	/* hash chain */
+	TAILQ_ENTRY(namecache) nc_lru;		/* LRU chain */
 	struct	vnode *nc_vp;		/* vnode the name refers to */
 	u_long	nc_vpid;		/* capability number of nc_vp */
 #ifdef NCH_STATISTICS
@@ -180,9 +179,6 @@
 };
 
 #ifdef KERNEL
-extern u_long	nextvnodeid;
-extern u_long	numcache;
-extern u_long	numvnodes;
 
 int	namei __P((struct nameidata *ndp));
 int	lookup __P((struct nameidata *ndp));
Index: sys/queue.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/queue.h,v
retrieving revision 1.14
diff -u -r1.14 queue.h
--- queue.h	1997/04/14 18:22:02	1.14
+++ queue.h	1997/04/23 11:29:51
@@ -85,6 +85,25 @@
  * complex end of list detection.
  *
  * For details on the use of these macros, see the queue(3) manual page.
+ *
+ *
+ *			SLIST	LIST	STAILQ	TAILQ	CIRCLEQ
+ * _HEAD		+	+	+	+	+
+ * _ENTRY		+	+	+	+	+
+ * _INIT		+	+	+	+	+
+ * _EMPTY		+	+	+	+	+
+ * _FIRST		+	+	-	+	+
+ * _NEXT		+	+	-	+	+
+ * _PREV		-	-	-	+	+
+ * _LAST		-	-	-	+	+
+ * _FOREACH		-	+	-	+	-
+ * _INSERT_HEAD		+	+	+	+	+
+ * _INSERT_BEFORE	-	+	-	+	+
+ * _INSERT_AFTER	+	+	+	+	+
+ * _INSERT_TAIL		-	-	+	+	+
+ * _REMOVE_HEAD		+	-	+	-	-
+ * _REMOVE		+	+	+	+	+
+ *
  */
 
 /*
@@ -157,6 +176,8 @@
 /*
  * Singly-linked Tail queue functions.
  */
+#define STAILQ_EMPTY(head) ((head)->stqh_first == NULL)
+
 #define	STAILQ_INIT(head) {						\
 	(head)->stqh_first = NULL;					\
 	(head)->stqh_last = &(head)->stqh_first;			\
@@ -217,6 +238,9 @@
 /*
  * List functions.
  */
+
+#define	LIST_EMPTY(head) ((head)->lh_first == NULL)
+
 #define LIST_FIRST(head)	((head)->lh_first)
 
 #define LIST_FOREACH(var, head, field)					\
@@ -277,6 +301,9 @@
  */
 #define	TAILQ_EMPTY(head) ((head)->tqh_first == NULL)
 
+#define TAILQ_FOREACH(var, head, field)					\
+	for (var = TAILQ_FIRST(head); var; var = TAILQ_NEXT(var, field))
+
 #define	TAILQ_FIRST(head) ((head)->tqh_first)
 
 #define	TAILQ_LAST(head) ((head)->tqh_last)
@@ -351,6 +378,13 @@
 /*
  * Circular queue functions.
  */
+#define CIRCLEQ_EMPTY(head) ((head)->cqh_first == (head)->cqh_last)
+
+#define CIRCLEQ_FIRST(head) ((head)->cqh_first)
+
+#define CIRCLEQ_FOREACH(var, head, field)				\
+	for((var) = (head)->cqh_first; (var); (var) = (var)->field.cqe_next)
+
 #define	CIRCLEQ_INIT(head) {						\
 	(head)->cqh_first = (void *)(head);				\
 	(head)->cqh_last = (void *)(head);				\
@@ -395,6 +429,12 @@
 		(head)->cqh_last->field.cqe_next = (elm);		\
 	(head)->cqh_last = (elm);					\
 }
+
+#define CIRCLEQ_LAST(head) ((head)->cqh_last)
+
+#define CIRCLEQ_NEXT(elm,field) ((elm)->field.cqe_next)
+
+#define CIRCLEQ_PREV(elm,field) ((elm)->field.cqe_prev)
 
 #define	CIRCLEQ_REMOVE(head, elm, field) {				\
 	if ((elm)->field.cqe_next == (void *)(head))			\
Index: sys/vnode.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/vnode.h,v
retrieving revision 1.43
diff -u -r1.43 vnode.h
--- vnode.h	1997/04/04 17:43:32	1.43
+++ vnode.h	1997/04/24 06:17:12
@@ -70,6 +70,7 @@
 
 typedef	int 	vop_t __P((void *));
 struct vm_object;
+struct namecache;
 
 /*
  * Reading or writing any of these items requires holding the appropriate lock.
@@ -110,6 +111,9 @@
 	struct	lock *v_vnlock;			/* used for non-locking fs's */
 	enum	vtagtype v_tag;			/* type of underlying data */
 	void 	*v_data;			/* private data for fs */
+	LIST_HEAD(, namecache) v_cache_src;	/* namecache entries */
+	LIST_HEAD(, namecache) v_cache_dst;	/* namecache entries */
+	int	martian;
 };
 #define	v_mountedhere	v_un.vu_mountedhere
 #define	v_socket	v_un.vu_socket
Index: kern/vfs_cache.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.24
diff -u -r1.24 vfs_cache.c
--- vfs_cache.c	1997/03/08 15:22:14	1.24
+++ vfs_cache.c	1997/04/24 12:12:03
@@ -74,16 +74,15 @@
 /*
  * Structures associated with name cacheing.
  */
-#define NCHHASH(dvp, cnp) \
-	(&nchashtbl[((dvp)->v_id + (cnp)->cn_hash) % nchash])
-static LIST_HEAD(nchashhead, namecache) *nchashtbl;	/* Hash Table */
-static u_long	nchash;			/* size of hash table */
-static u_long	numcache;		/* number of cache entries allocated */
 static TAILQ_HEAD(, namecache) nclruhead;	/* LRU chain */
 struct	nchstats nchstats;		/* cache effectiveness statistics */
 
-static int	doingcache = 1;		/* 1 => enable the cache */
+static int	doingcache = 0;		/* 1 => enable the cache */
 SYSCTL_INT(_debug, OID_AUTO, vfscache, CTLFLAG_RW, &doingcache, 0, "");
+static u_long	numcache;		/* number of cache entries allocated */
+SYSCTL_INT(_debug, OID_AUTO, vfscachesize, CTLFLAG_RD, &numcache, 0, "");
+static u_long	cachelimit;		/* cache size limit */
+SYSCTL_INT(_debug, OID_AUTO, vfscachelimit, CTLFLAG_RW, &cachelimit, 0, "");
 
 #ifdef NCH_STATISTICS
 u_long	nchnbr;
@@ -94,15 +93,41 @@
 #define NCHHIT(ncp)
 #endif
 
+static int
+sysctl_vfscachedump SYSCTL_HANDLER_ARGS
+{
+	struct namecache *ncp;
+	int error;
+
+	/* Estimate only ? */
+	if (!req->oldptr)
+		return (SYSCTL_OUT(req,0,numcache*sizeof(struct namecache)));
+
+	
+	TAILQ_FOREACH(ncp, &nclruhead, nc_lru) 
+		if (error = SYSCTL_OUT(req, ncp, sizeof *ncp))
+			return (error);
+	return (0);
+}
+
+SYSCTL_PROC(_debug, OID_AUTO, vfscachedump, CTLTYPE_OPAQUE|CTLFLAG_RD,
+	0, 0, sysctl_vfscachedump, "S,namecache", "");
+
 /*
- * Delete an entry from its hash list and move it to the front
+ * Delete an entry from its vnode list and move it to the front
  * of the LRU list for immediate reuse.
  */
-#define PURGE(ncp)  {						\
-	LIST_REMOVE(ncp, nc_hash);				\
-	ncp->nc_hash.le_prev = 0;				\
-	TAILQ_REMOVE(&nclruhead, ncp, nc_lru);			\
-	TAILQ_INSERT_HEAD(&nclruhead, ncp, nc_lru);		\
+static void
+cache_zap(struct namecache *ncp)
+{
+	if (ncp->nc_hash_src.le_prev)
+		LIST_REMOVE(ncp, nc_hash_src);
+	ncp->nc_hash_src.le_prev = 0;
+	if (ncp->nc_hash_dst.le_prev)
+		LIST_REMOVE(ncp, nc_hash_dst);
+	ncp->nc_hash_dst.le_prev = 0;
+	TAILQ_REMOVE(&nclruhead, ncp, nc_lru);
+	TAILQ_INSERT_HEAD(&nclruhead, ncp, nc_lru);
 }
 
 /*
@@ -139,7 +164,6 @@
 	struct componentname *cnp;
 {
 	register struct namecache *ncp, *nnp;
-	register struct nchashhead *ncpp;
 
 	if (!doingcache) {
 		cnp->cn_flags &= ~MAKEENTRY;
@@ -151,19 +175,10 @@
 		return (0);
 	}
 
-	ncpp = NCHHASH(dvp, cnp);
-	for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) {
-		nnp = ncp->nc_hash.le_next;
-		/* If one of the vp's went stale, don't bother anymore. */
-		if ((ncp->nc_dvpid != ncp->nc_dvp->v_id) ||
-		    (ncp->nc_vp && ncp->nc_vpid != ncp->nc_vp->v_id)) {
-			nchstats.ncs_falsehits++;
-			PURGE(ncp);
-			continue;
-		}
+	for (ncp = dvp->v_cache_src.lh_first; ncp != 0; ncp = nnp) {
+		nnp = ncp->nc_hash_src.le_next;
 		/* Now that we know the vp's to be valid, is it ours ? */
-		if (ncp->nc_dvp == dvp &&
-		    ncp->nc_nlen == cnp->cn_namelen &&
+		if (ncp->nc_nlen == cnp->cn_namelen &&
 		    !bcmp(ncp->nc_name, cnp->cn_nameptr, (u_int)ncp->nc_nlen))
 			break;
 	}
@@ -179,7 +194,7 @@
 	/* We don't want to have an entry, so dump it */
 	if ((cnp->cn_flags & MAKEENTRY) == 0) {
 		nchstats.ncs_badhits++;
-		PURGE(ncp);
+		cache_zap(ncp);
 		return (0);
 	}
 
@@ -196,7 +211,7 @@
 	/* We found a negative match, and want to create it, so purge */
 	if (cnp->cn_nameiop == CREATE) {
 		nchstats.ncs_badhits++;
-		PURGE(ncp);
+		cache_zap(ncp);
 		return (0);
 	}
 
@@ -220,7 +235,6 @@
 	struct componentname *cnp;
 {
 	register struct namecache *ncp;
-	register struct nchashhead *ncpp;
 
 	if (!doingcache)
 		return;
@@ -235,21 +249,20 @@
 	 * allowed and the one at the front of the LRU list is in use.
 	 * Otherwise we use the one at the front of the LRU list.
 	 */
-	if (numcache < desiredvnodes &&
-	    ((ncp = nclruhead.tqh_first) == NULL ||
-	    ncp->nc_hash.le_prev != 0)) {
+	ncp = nclruhead.tqh_first;
+	if (ncp && !ncp->nc_hash_src.le_prev) {
+		/* reuse an old entry */
+		TAILQ_REMOVE(&nclruhead, ncp, nc_lru);
+	} else if (numcache < (cachelimit ? cachelimit : desiredvnodes)) {
 		/* Add one more entry */
 		ncp = (struct namecache *)
 			malloc((u_long)sizeof *ncp, M_CACHE, M_WAITOK);
 		bzero((char *)ncp, sizeof *ncp);
 		numcache++;
-	} else if (ncp = nclruhead.tqh_first) {
+	} else if (ncp) {
 		/* reuse an old entry */
+		cache_zap(ncp);
 		TAILQ_REMOVE(&nclruhead, ncp, nc_lru);
-		if (ncp->nc_hash.le_prev != 0) {
-			LIST_REMOVE(ncp, nc_hash);
-			ncp->nc_hash.le_prev = 0;
-		}
 	} else {
 		/* give up */
 		return;
@@ -268,13 +281,12 @@
 			++vp->v_usage;
 	} else
 		ncp->nc_vpid = cnp->cn_flags & ISWHITEOUT;
-	ncp->nc_dvp = dvp;
-	ncp->nc_dvpid = dvp->v_id;
 	ncp->nc_nlen = cnp->cn_namelen;
 	bcopy(cnp->cn_nameptr, ncp->nc_name, (unsigned)ncp->nc_nlen);
 	TAILQ_INSERT_TAIL(&nclruhead, ncp, nc_lru);
-	ncpp = NCHHASH(dvp, cnp);
-	LIST_INSERT_HEAD(ncpp, ncp, nc_hash);
+	LIST_INSERT_HEAD(&dvp->v_cache_src, ncp, nc_hash_src);
+	if (vp)
+		LIST_INSERT_HEAD(&vp->v_cache_dst, ncp, nc_hash_dst);
 }
 
 /*
@@ -285,7 +297,6 @@
 {
 
 	TAILQ_INIT(&nclruhead);
-	nchashtbl = phashinit(desiredvnodes, M_CACHE, &nchash);
 }
 
 /*
@@ -301,15 +312,23 @@
 	struct vnode *vp;
 {
 	struct namecache *ncp;
-	struct nchashhead *ncpp;
 	static u_long nextvnodeid;
 
+	if (vp->martian != 0x1007)
+		Debugger("Martian Vnode!");
+	while (!LIST_EMPTY(&vp->v_cache_src)) {
+		cache_zap(LIST_FIRST(&vp->v_cache_src));
+	}
+	while (!LIST_EMPTY(&vp->v_cache_dst)) {
+		cache_zap(LIST_FIRST(&vp->v_cache_dst));
+	}
 	vp->v_id = ++nextvnodeid;
+	vp->martian = 0x1007;
 	if (nextvnodeid != 0)
 		return;
-	for (ncpp = &nchashtbl[nchash - 1]; ncpp >= nchashtbl; ncpp--) {
-		while (ncp = ncpp->lh_first)
-			PURGE(ncp);
+	TAILQ_FOREACH(ncp, &nclruhead, nc_lru)  {
+		if (ncp->nc_hash_src.le_prev)
+			cache_zap(ncp);
 	}
 	vp->v_id = ++nextvnodeid;
 }
@@ -324,18 +343,13 @@
 cache_purgevfs(mp)
 	struct mount *mp;
 {
-	struct nchashhead *ncpp;
 	struct namecache *ncp, *nnp;
+	struct vnode *vnp;
 
-	/* Scan hash tables for applicable entries */
-	for (ncpp = &nchashtbl[nchash - 1]; ncpp >= nchashtbl; ncpp--) {
-		for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) {
-			nnp = ncp->nc_hash.le_next;
-			if (ncp->nc_dvpid != ncp->nc_dvp->v_id ||
-			    (ncp->nc_vp && ncp->nc_vpid != ncp->nc_vp->v_id) ||
-			    ncp->nc_dvp->v_mount == mp) {
-				PURGE(ncp);
-			}
-		}
+    loop:
+	LIST_FOREACH(vnp, &mp->mnt_vnodelist, v_mntvnodes) {
+		if (vnp->v_mount != mp)
+			goto loop;
+		cache_purge(vnp);
 	}
 }
Index: kern/vfs_subr.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.82
diff -u -r1.82 vfs_subr.c
--- vfs_subr.c	1997/04/04 17:46:16	1.82
+++ vfs_subr.c	1997/04/24 12:27:06
@@ -77,7 +77,7 @@
 #endif
 static void	vclean __P((struct vnode *vp, int flags, struct proc *p));
 static void	vgonel __P((struct vnode *vp, struct proc *p));
-unsigned long	numvnodes;
+static unsigned long	numvnodes;
 extern void	vputrele __P((struct vnode *vp, int put));
 
 enum vtype iftovt_tab[16] = {
@@ -361,6 +361,9 @@
 		vp = (struct vnode *) malloc((u_long) sizeof *vp,
 		    M_VNODE, M_WAITOK);
 		bzero((char *) vp, sizeof *vp);
+		LIST_INIT(&vp->v_cache_src);
+		LIST_INIT(&vp->v_cache_dst);
+		vp->martian = 0x1007;
 		numvnodes++;
 	} else {
 		for (vp = vnode_free_list.tqh_first;

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 06:42:53 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id GAA06614
          for fs-outgoing; Thu, 24 Apr 1997 06:42:53 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06605;
          Thu, 24 Apr 1997 06:42:11 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id PAA00334;
	Thu, 24 Apr 1997 15:41:12 +0200 (CEST)
To: dg@root.com
cc: Poul-Henning Kamp <phk@dk.tfs.com>, Michael Hancock <michaelh@cet.co.jp>,
        fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 05:08:55 PDT."
             <199704241208.FAA09111@root.com> 
Date: Thu, 24 Apr 1997 15:41:12 +0200
Message-ID: <332.861889272@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <199704241208.FAA09111@root.com>, David Greenman writes:
>>>With hashing you can work on the hashing algorithm.  Btw, what is the hash
>>>key now?  vp+name[20]?
>>
>>directory vnode v_id + the regular namei hash.
>
>   Calculating the hash is expensive since it involves doing a byte sum of
>all of the characters in the name, adding in the directory v_id, and dividing
>by a prime number. There's got to be a better way. I haven't carefully read
>what Poul-Henning is proposing, but the gist I got was that he intends to
>hang the file name cache entries off of the directory vnode? It seems to
>me that this works well for small directories and poorly for large
>directories unless you construct a binary tree and keep the entries sorted.

I don't think that the tradeoff to a binary tree will make sense, but it 
may be worth a try.

It's true that my scheme may penalize large directories, but I'm pretty 
convinced that "large" in this context may be quite a lot larger than
one would think.  If "large" is anything above 100 I think that we can 
pretty safely ignore it.

It may also pay off to sort the lists by componentname, which will cut 
the number of comparisons in half.  Making it a circleq and searching
from whichever end is nearest can cut this further.

>   It's interesting that the single largest CPU consumer on wcarchive appears
>to be the namei cache lookups. I think the hash algorithm needs to be
>re-visited at the very least, perhaps changing the divide by a prime into
>some sort of xor of the filename characters (xored in pairs as int16's).

It may actually be worth optimizing ".." out with special case code.
If we added a "parent dir vnode" pointer to the vnodes, ".." can
be taken out, and the namecache entries they occupy can be used for
something more valuable.

Another thing, try setting the size of the namecache to 1.5 * desiredvnodes
instead of the 1:1 factor we use now.

Ohh, and if you run the code I sent in the other email, please 

	sysctl -bn debug.vfscachedump > somefile

and send me that file for analysis.

Poul-Henning
--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 06:43:55 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id GAA06678
          for fs-outgoing; Thu, 24 Apr 1997 06:43:55 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06664;
          Thu, 24 Apr 1997 06:43:21 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id PAA00359;
	Thu, 24 Apr 1997 15:42:24 +0200 (CEST)
To: Michael Hancock <michaelh@cet.co.jp>
cc: David Greenman <dg@root.com>, Poul-Henning Kamp <phk@dk.tfs.com>,
        fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 21:30:02 +0900."
             <Pine.SV4.3.95.970424212315.13828A-100000@parkplace.cet.co.jp> 
Date: Thu, 24 Apr 1997 15:42:23 +0200
Message-ID: <357.861889343@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <Pine.SV4.3.95.970424212315.13828A-100000@parkplace.cet.co.jp>, Mich
ael Hancock writes:
>
>>    It's interesting that the single largest CPU consumer on wcarchive appear
>s
>> to be the namei cache lookups. I think the hash algorithm needs to be
>> re-visited at the very least, perhaps changing the divide by a prime into
>> some sort of xor of the filename characters (xored in pairs as int16's).
>
>I don't think you need to even look at all the characters.  You can take
>the first and last and couple in the middle.  To get the middle ones,
>instead of dividing the length by 2 do a bit shift on the length.  Then
>xor the 2 pairs.  Any hash guru's on the list? 

I'd rather get rid of it entirely...

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 07:04:26 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id HAA07677
          for fs-outgoing; Thu, 24 Apr 1997 07:04:26 -0700 (PDT)
Received: from Sisyphos.MI.Uni-Koeln.DE (Sisyphos.MI.Uni-Koeln.DE [134.95.212.10])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id HAA07667
          for <fs@freebsd.org>; Thu, 24 Apr 1997 07:04:20 -0700 (PDT)
Received: from x14.mi.uni-koeln.de (annexr3-1.slip.Uni-Koeln.DE) by Sisyphos.MI.Uni-Koeln.DE with SMTP id AA01677
  (5.67b/IDA-1.5 for <fs@freebsd.org>); Thu, 24 Apr 1997 16:03:47 +0200
Received: (from se@localhost) by x14.mi.uni-koeln.de (8.8.5/8.6.9) id QAA23373; Thu, 24 Apr 1997 16:03:39 +0200 (CEST)
Message-Id: <19970424160338.57282@x14.mi.uni-koeln.de>
Date: Thu, 24 Apr 1997 16:03:38 +0200
From: Stefan Esser <se@freebsd.org>
To: Don Lewis <Don.Lewis@tsc.tdk.com>
Cc: dg@root.com, Poul-Henning Kamp <phk@dk.tfs.com>,
        Michael Hancock <michaelh@cet.co.jp>, fs@freebsd.org
Subject: Re: the namei cache...
References: <dg@root.com> <199704241256.FAA11674@salsa.gv.tsc.tdk.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.68
In-Reply-To: <199704241256.FAA11674@salsa.gv.tsc.tdk.com>; from Don Lewis on Thu, Apr 24, 1997 at 05:56:01AM -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Apr 24, Don Lewis <Don.Lewis@tsc.tdk.com> wrote:
> Yes, most processors are really bad at integer divide.  You might look at
> the Perl hash.  You multiply the current value by 33 and add the next
> character, then mask off however many bits you want at the end (the table
> size must be a power of 2).  If your compiler is at all smart, it will
> notice that the multiply is just a shift and an add.  I suspect this might
> be pretty easy to just drop in to the existing code for a quick trial.

I used the following as a hash function, after looking up
algorithms in Knuth's book and studying lots of examples
in several software packages. I needed a fast hash, which
should work on power of 2 size tables, with keys mostly
being ASCII text. I did quite some profiling and testing
with my data sets, and the following gave good results:

#define SHVAL		(8)
#define MULTIPLIER	(0x5425daf1)

unsigned long 
hashf (unsigned long hashmax, void *key, size_t keylen)
{
    unsigned long i = 0;
    unsigned char *p = key;
    while (keylen--)
    {
	unsigned char c = *p++;
	i -= (i << 4) ^ c;
/*      i -= (i << 4) + (i >> 23) ^ c;*/
    }
    i *= MULTIPLIER;
    i >>= SHVAL;
    return i & hashmax;
}

This lead to significantly less collisions than other,
simpler hash functions. I tried other versions that fed
back some of the high order bits, but surprisingly found
that this did not help reduce the number of collisions,
and thus was not worth the additional shift and add.
But this is possibly not true in the case of file names
that tend to have a common suffix (.tar.gz, for example).

(The value of hashmax must obviously be a power of 2
minus 1 ...)

I used tables from a few thousand up to a few hundred
thousand entries, BTW.

Regards, STefan

From owner-freebsd-fs  Thu Apr 24 07:58:31 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id HAA10041
          for fs-outgoing; Thu, 24 Apr 1997 07:58:31 -0700 (PDT)
Received: from nic.follonett.no (nic.follonett.no [194.198.43.10])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id HAA10035
          for <fs@freebsd.org>; Thu, 24 Apr 1997 07:58:19 -0700 (PDT)
Received: (from uucp@localhost) by nic.follonett.no (8.8.5/8.8.3) with UUCP id QAA00870; Thu, 24 Apr 1997 16:55:19 +0200 (MET DST)
Received: from oo7 ([192.0.0.136]) by dimaga.com (8.7.5/8.7.2) with SMTP id QAA01966; Thu, 24 Apr 1997 16:07:41 +0200 (MET DST)
Message-Id: <3.0.32.19970424150956.01100be0@dimaga.com>
X-Sender: eivind@dimaga.com
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Thu, 24 Apr 1997 15:09:59 +0200
To: Michael Hancock <michaelh@cet.co.jp>
From: Eivind Eklund <eivind@dimaga.com>
Subject: Re: the namei cache... 
Cc: fs@freebsd.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

At 09:30 PM 4/24/97 +0900, Michael Hancock wrote:
>On Thu, 24 Apr 1997, David Greenman wrote:
>
>>    It's interesting that the single largest CPU consumer on wcarchive
appears
>> to be the namei cache lookups. I think the hash algorithm needs to be
>> re-visited at the very least, perhaps changing the divide by a prime into
>> some sort of xor of the filename characters (xored in pairs as int16's).
>
>I don't think you need to even look at all the characters.  You can take
>the first and last and couple in the middle.  To get the middle ones,
>instead of dividing the length by 2 do a bit shift on the length.  Then
>xor the 2 pairs.  Any hash guru's on the list? 

Experimenting with this is easy if we pull the hash code out into a
seperate file - we can just run it against a find on a 'normal machine'.  I
don't know how much just adding the last and first characters will buy us,
as finding the end of the string still include parsing the entire string.

However, replacing the hash algorithm is certainly worthwhile - the present
one is both slow and insensitive to interesting changes (e.g. swapping of
characters).

A speedup could be to use an algorithm that work on words instead of bytes;
however, this mean that a read-overrun of up to three bytes _must be
tolerable_.  This can only work as long as we control the implementation.

Testing for zero-termination is fairly easy - the following expression will
do it, assuming a 2s complement machine, a 32-bit word, and 8-bit chars.
((w-0x01010101)&~w&0x80808080)
Unfortuneatly, a test for both / and NUL is over twice as expensive,
involving xors to change / to NUL and double tests :-(

I haven't got time to test right now (will see if I can do it later
tonight), but that should give anybody interesting in trying enough
information to have a go.  The original code is in sys/kern/vfs_lookup.c
(for anybody that hasn't looked yet)


Eivind Eklund perhaps@yes.no http://maybe.yes.no/perhaps/ eivind@freebsd.org

From owner-freebsd-fs  Thu Apr 24 10:39:10 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id KAA19851
          for fs-outgoing; Thu, 24 Apr 1997 10:39:10 -0700 (PDT)
Received: from haldjas.folklore.ee (Haldjas.folklore.ee [193.40.6.121])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA19644
          for <fs@freebsd.org>; Thu, 24 Apr 1997 10:34:50 -0700 (PDT)
Received: from localhost (narvi@localhost)
          by haldjas.folklore.ee (8.8.4/8.8.4) with SMTP
	  id UAA18267; Thu, 24 Apr 1997 20:24:13 +0300 (EEST)
Date: Thu, 24 Apr 1997 20:24:11 +0300 (EEST)
From: Narvi <narvi@haldjas.folklore.ee>
To: David Greenman <dg@root.com>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, Michael Hancock <michaelh@cet.co.jp>,
        fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <199704241208.FAA09111@root.com>
Message-ID: <Pine.BSF.3.95.970424200621.17927A-100000@haldjas.folklore.ee>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk



On Thu, 24 Apr 1997, David Greenman wrote:

> >>With hashing you can work on the hashing algorithm.  Btw, what is the hash
> >>key now?  vp+name[20]?
> >
> >directory vnode v_id + the regular namei hash.
> 
>    Calculating the hash is expensive since it involves doing a byte sum of
> all of the characters in the name, adding in the directory v_id, and dividing
> by a prime number. There's got to be a better way. I haven't carefully read
 
How about not adding byte values but say, long values? We may have to keep
upto 3 bytes of additional space (avoiding overuns) but get it done in
about 1/4th of the additions. Getting reed of the div wouldn't also be bad
but might not be worth it.

	Sander

> what Poul-Henning is proposing, but the gist I got was that he intends to
> hang the file name cache entries off of the directory vnode? It seems to
> me that this works well for small directories and poorly for large
> directories unless you construct a binary tree and keep the entries sorted.
> Then the problem becomes managing the tree (keeping it balanced, etc), which
> itself can be expensive...although lookups happen far more often than creates
> and deletes.
>    It's interesting that the single largest CPU consumer on wcarchive appears
> to be the namei cache lookups. I think the hash algorithm needs to be
> re-visited at the very least, perhaps changing the divide by a prime into
> some sort of xor of the filename characters (xored in pairs as int16's).
> 
> -DG
> 
> David Greenman
> Core-team/Principal Architect, The FreeBSD Project
> 


From owner-freebsd-fs  Thu Apr 24 12:33:43 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id MAA25858
          for fs-outgoing; Thu, 24 Apr 1997 12:33:43 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA25853
          for <fs@freebsd.org>; Thu, 24 Apr 1997 12:33:41 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id MAA10488; Thu, 24 Apr 1997 12:34:49 -0700 (PDT)
Message-Id: <199704241934.MAA10488@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Narvi <narvi@haldjas.folklore.ee>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 20:24:11 +0300."
             <Pine.BSF.3.95.970424200621.17927A-100000@haldjas.folklore.ee> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 12:34:49 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>about 1/4th of the additions. Getting reed of the div wouldn't also be bad
>but might not be worth it.

   Not worth it? At greater than 50 cycles, the divide is more expensive
than the entire add loop.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 12:42:53 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id MAA26373
          for fs-outgoing; Thu, 24 Apr 1997 12:42:53 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA26350;
          Thu, 24 Apr 1997 12:42:18 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id VAA01305;
	Thu, 24 Apr 1997 21:40:49 +0200 (CEST)
To: dg@root.com
cc: Narvi <narvi@haldjas.folklore.ee>, Poul-Henning Kamp <phk@dk.tfs.com>,
        fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 12:34:49 PDT."
             <199704241934.MAA10488@root.com> 
Date: Thu, 24 Apr 1997 21:40:49 +0200
Message-ID: <1303.861910849@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <199704241934.MAA10488@root.com>, David Greenman writes:
>>about 1/4th of the additions. Getting reed of the div wouldn't also be bad
>>but might not be worth it.
>
>   Not worth it? At greater than 50 cycles, the divide is more expensive
>than the entire add loop.

But loosing the hash entirely is >even< faster, right ?  :-)

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 13:17:00 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id NAA28354
          for fs-outgoing; Thu, 24 Apr 1997 13:17:00 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA28349
          for <fs@freebsd.org>; Thu, 24 Apr 1997 13:16:57 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id NAA10732; Thu, 24 Apr 1997 13:18:33 -0700 (PDT)
Message-Id: <199704242018.NAA10732@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 21:40:49 +0200."
             <1303.861910849@critter> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 13:18:33 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>In message <199704241934.MAA10488@root.com>, David Greenman writes:
>>>about 1/4th of the additions. Getting reed of the div wouldn't also be bad
>>>but might not be worth it.
>>
>>   Not worth it? At greater than 50 cycles, the divide is more expensive
>>than the entire add loop.
>
>But loosing the hash entirely is >even< faster, right ?  :-)

   Maybe. Here's the problem with your approach as I see it: On wcarchive,
we have about 44,000 files in the namei cache. The hash table size is
approximately 44,000 entries large (rounded up or down to the nearest
prime...I forget). This means that there is an average of about 1 file in
each hash bucket...or very little to search through. Of course there are
far fewer than 1 directory per file, so now if you change this so that all
of the files for a directory are on one list, then you're going to have to do
a lot more work to find it. I think the average directory length on wcarchive
is about 50 files or so, so this would be about 50 times more entries in the
list compared to the old hash table scheme.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 13:40:10 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id NAA29788
          for fs-outgoing; Thu, 24 Apr 1997 13:40:10 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA29717;
          Thu, 24 Apr 1997 13:39:53 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id WAA01422;
	Thu, 24 Apr 1997 22:38:55 +0200 (CEST)
To: dg@root.com
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 13:18:33 PDT."
             <199704242018.NAA10732@root.com> 
Date: Thu, 24 Apr 1997 22:38:54 +0200
Message-ID: <1420.861914334@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <199704242018.NAA10732@root.com>, David Greenman writes:
>>In message <199704241934.MAA10488@root.com>, David Greenman writes:
>>>>about 1/4th of the additions. Getting reed of the div wouldn't also be bad
>>>>but might not be worth it.
>>>
>>>   Not worth it? At greater than 50 cycles, the divide is more expensive
>>>than the entire add loop.
>>
>>But loosing the hash entirely is >even< faster, right ?  :-)
>
>   Maybe. Here's the problem with your approach as I see it: On wcarchive,
>we have about 44,000 files in the namei cache. The hash table size is
>approximately 44,000 entries large (rounded up or down to the nearest

Unlikely.  You have a 44000 entry cache, but as far as I've been able to
make out you only have about 40000 >valid< entries in it (if it's anything
like the data I have here).  I'm trying to get the last 10% back.

>prime...I forget). This means that there is an average of about 1 file in
>each hash bucket...or very little to search through. 

Unfortunately not so, finding a hash that good would be really, really,
hard.  But I agree that we're in the O(3) land.

>each hash bucket...or very little to search through. Of course there are
>far fewer than 1 directory per file, so now if you change this so that all
>of the files for a directory are on one list, then you're going to have to do
>a lot more work to find it. I think the average directory length on wcarchive
>is about 50 files or so, so this would be about 50 times more entries in the
>list compared to the old hash table scheme.

This is what I thought, but appearantly not so.  Part of the problem seems
to be that there are multiple names pointing at the same directory ("foo"
 + N * "..") and that depletes your name-cache.  With the current design
it should probably be 1.5 .. 2 times bigger than desiredvnodes.

I'm very reluctant to increase it, when entries cost 64 bytes each, and
since data seems to indicate that 10% is stale (but we don't know how to
find them), so we keep recycling valid entries instead.

Another thing that bothers me is the size.  The reason for the current 
size of 64 is the way malloc works.  In reality we would get very close 
to the same efficiency from 48 bytes per entry.  I may look into changing 
the way we allocate them.  It would buy us 33% more entries in the same 
space.

David:
	Can I get you to try one (mostly harmless) experiment on 
	wcarchive ?  In vfs_cache.c, where it checks the number of 
	namecache entries against "desiredvnodes" could you try to use 
	2*desiredvnodes (or something similar) instead, and measure 
	the difference in cache hit rate ?

Poul-Henning

PS: I guess that I'm doing "the bad thing" here, I'm changing two 
variables at the same time:

	1. changing linkage of namecache entries to permit complete removal
		(This increases the efficiency of the cache)

	2. dropping hash algorithm.
		(This may make the cache faster/slower depending on POM)

But don't worry, until I have understood the impact, I'm not committing
anything :-)

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Thu Apr 24 13:57:55 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id NAA00821
          for fs-outgoing; Thu, 24 Apr 1997 13:57:55 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA00814
          for <fs@freebsd.org>; Thu, 24 Apr 1997 13:57:53 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id NAA11021; Thu, 24 Apr 1997 13:59:30 -0700 (PDT)
Message-Id: <199704242059.NAA11021@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 22:38:54 +0200."
             <1420.861914334@critter> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 13:59:29 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>This is what I thought, but appearantly not so.  Part of the problem seems
>to be that there are multiple names pointing at the same directory ("foo"
> + N * "..") and that depletes your name-cache.  With the current design
>it should probably be 1.5 .. 2 times bigger than desiredvnodes.
>
>I'm very reluctant to increase it, when entries cost 64 bytes each, and
>since data seems to indicate that 10% is stale (but we don't know how to
>find them), so we keep recycling valid entries instead.
>
>Another thing that bothers me is the size.  The reason for the current 
>size of 64 is the way malloc works.  In reality we would get very close 
>to the same efficiency from 48 bytes per entry.  I may look into changing 
>the way we allocate them.  It would buy us 33% more entries in the same 
>space.
>
>David:
>	Can I get you to try one (mostly harmless) experiment on 
>	wcarchive ?  In vfs_cache.c, where it checks the number of 
>	namecache entries against "desiredvnodes" could you try to use 
>	2*desiredvnodes (or something similar) instead, and measure 
>	the difference in cache hit rate ?

   I've increased kern.maxvnodes which I think should have the same effect.
This actually made things worse - a few percent lower cache hit rate. Very
odd. It might have just been a statistical anomoly. In any case, it definately
didn't improve the hit rate.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 16:35:29 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id QAA09496
          for fs-outgoing; Thu, 24 Apr 1997 16:35:29 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA09489
          for <fs@freebsd.org>; Thu, 24 Apr 1997 16:35:26 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id XAA15582; Thu, 24 Apr 1997 23:34:56 GMT
Date: Fri, 25 Apr 1997 08:34:56 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: Poul-Henning Kamp <phk@dk.tfs.com>
cc: dg@root.com, Narvi <narvi@haldjas.folklore.ee>, fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <1303.861910849@critter>
Message-ID: <Pine.SV4.3.95.970425083358.15545A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 24 Apr 1997, Poul-Henning Kamp wrote:

> >   Not worth it? At greater than 50 cycles, the divide is more expensive
> >than the entire add loop.
> 
> But loosing the hash entirely is >even< faster, right ?  :-)

Faster than traversing a long linked list?


From owner-freebsd-fs  Thu Apr 24 16:57:56 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id QAA10460
          for fs-outgoing; Thu, 24 Apr 1997 16:57:56 -0700 (PDT)
Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA10454
          for <fs@freebsd.org>; Thu, 24 Apr 1997 16:57:53 -0700 (PDT)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
          by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP
	  id QAA29384; Thu, 24 Apr 1997 16:56:21 -0700 (PDT)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id QAA24329;
	Thu, 24 Apr 1997 16:56:20 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id QAA12777;
	Thu, 24 Apr 1997 16:56:19 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199704242356.QAA12777@salsa.gv.tsc.tdk.com>
Date: Thu, 24 Apr 1997 16:56:18 -0700
In-Reply-To: Narvi <narvi@haldjas.folklore.ee>
       "Re: the namei cache..." (Apr 24,  8:24pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: Narvi <narvi@haldjas.folklore.ee>, David Greenman <dg@root.com>
Subject: Re: the namei cache...
Cc: Poul-Henning Kamp <phk@dk.tfs.com>, Michael Hancock <michaelh@cet.co.jp>,
        fs@freebsd.org
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Apr 24,  8:24pm, Narvi wrote:
} Subject: Re: the namei cache...
}  
} How about not adding byte values but say, long values? We may have to keep
} upto 3 bytes of additional space (avoiding overuns) but get it done in
} about 1/4th of the additions.

The additions are pretty cheap.  The problem with adding bigger chunks is
that the names aren't word aligned, so for each set of 4 bytes, you have
to read two words, extract the desired bytes, and concatenate the results.

			---  Truck

From owner-freebsd-fs  Thu Apr 24 17:01:42 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA10796
          for fs-outgoing; Thu, 24 Apr 1997 17:01:42 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA10791
          for <fs@freebsd.org>; Thu, 24 Apr 1997 17:01:40 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id RAA11911; Thu, 24 Apr 1997 17:03:00 -0700 (PDT)
Message-Id: <199704250003.RAA11911@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Don Lewis <Don.Lewis@tsc.tdk.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 16:56:18 PDT."
             <199704242356.QAA12777@salsa.gv.tsc.tdk.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 17:03:00 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>On Apr 24,  8:24pm, Narvi wrote:
>} Subject: Re: the namei cache...
>}  
>} How about not adding byte values but say, long values? We may have to keep
>} upto 3 bytes of additional space (avoiding overuns) but get it done in
>} about 1/4th of the additions.
>
>The additions are pretty cheap.  The problem with adding bigger chunks is
>that the names aren't word aligned, so for each set of 4 bytes, you have
>to read two words, extract the desired bytes, and concatenate the results.

   Actually, you could read/add in 16bit words since the component name is
null terminated...or if the trailing space is zeroed up to a longword
boundry, you could add in longwords.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 17:12:24 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA11277
          for fs-outgoing; Thu, 24 Apr 1997 17:12:24 -0700 (PDT)
Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA11271
          for <fs@freebsd.org>; Thu, 24 Apr 1997 17:12:21 -0700 (PDT)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
          by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP
	  id RAA00156; Thu, 24 Apr 1997 17:12:00 -0700 (PDT)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA24488;
	Thu, 24 Apr 1997 17:11:58 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA12811;
	Thu, 24 Apr 1997 17:11:57 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199704250011.RAA12811@salsa.gv.tsc.tdk.com>
Date: Thu, 24 Apr 1997 17:11:57 -0700
In-Reply-To: Eivind Eklund <eivind@dimaga.com>
       "Re: the namei cache..." (Apr 24,  3:09pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: Eivind Eklund <eivind@dimaga.com>, Michael Hancock <michaelh@cet.co.jp>
Subject: Re: the namei cache...
Cc: fs@freebsd.org
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Apr 24,  3:09pm, Eivind Eklund wrote:
} Subject: Re: the namei cache...
} 
} However, replacing the hash algorithm is certainly worthwhile - the present
} one is both slow and insensitive to interesting changes (e.g. swapping of
} characters).

I don't necessarily agree with slow (other than the divide at the end).
It *might* be worthwhile escapeing from the loop early on names that are
too long to cache, but since these are rare, the extra tests might be
too much of a penalty in the common case.

I think the current hash function does a poor job of distributing the
keys, especially if you have a large cache.  A directory containing
files with names up to 14 characters of all the characters < 128
can only distribute it's entries over at most 1778 hash buckets.  The
misc.jobs.offered directory on our news server, which contains 28325
files with 7 digit names, will only distribute its entries over 399
buckets.  It looks like we're very dependent on the v_id value to
give a good global distribution.

			---  Truck

From owner-freebsd-fs  Thu Apr 24 17:25:35 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA11712
          for fs-outgoing; Thu, 24 Apr 1997 17:25:35 -0700 (PDT)
Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA11706
          for <fs@freebsd.org>; Thu, 24 Apr 1997 17:25:33 -0700 (PDT)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
          by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP
	  id RAA00575; Thu, 24 Apr 1997 17:25:30 -0700 (PDT)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA24790;
	Thu, 24 Apr 1997 17:25:29 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA12844;
	Thu, 24 Apr 1997 17:25:28 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199704250025.RAA12844@salsa.gv.tsc.tdk.com>
Date: Thu, 24 Apr 1997 17:25:27 -0700
In-Reply-To: David Greenman <dg@root.com>
       "Re: the namei cache..." (Apr 24,  5:03pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: dg@root.com, Don Lewis <Don.Lewis@tsc.tdk.com>
Subject: Re: the namei cache...
Cc: fs@freebsd.org
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Apr 24,  5:03pm, David Greenman wrote:
} Subject: Re: the namei cache...

}    Actually, you could read/add in 16bit words since the component name is
} null terminated...or if the trailing space is zeroed up to a longword
} boundry, you could add in longwords.

I was more worried about where the component name starts.  It appears that
we're extracting it right out of the pathname, so depending on what
preceeded this component name, it could be on an odd byte boundary.
Also, the component name can be terminated with either a NUL or a /.

			---  Truck

From owner-freebsd-fs  Thu Apr 24 17:42:51 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA12857
          for fs-outgoing; Thu, 24 Apr 1997 17:42:51 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA12852
          for <fs@freebsd.org>; Thu, 24 Apr 1997 17:42:48 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id RAA12038; Thu, 24 Apr 1997 17:44:23 -0700 (PDT)
Message-Id: <199704250044.RAA12038@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Don Lewis <Don.Lewis@tsc.tdk.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 17:25:27 PDT."
             <199704250025.RAA12844@salsa.gv.tsc.tdk.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 17:44:23 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>Also, the component name can be terminated with either a NUL or a /.

   Really? That's a surprise if true.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 17:48:22 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA13239
          for fs-outgoing; Thu, 24 Apr 1997 17:48:22 -0700 (PDT)
Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA13226
          for <fs@freebsd.org>; Thu, 24 Apr 1997 17:48:18 -0700 (PDT)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
          by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP
	  id RAA01421; Thu, 24 Apr 1997 17:48:15 -0700 (PDT)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA25196;
	Thu, 24 Apr 1997 17:48:14 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA12907;
	Thu, 24 Apr 1997 17:48:12 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199704250048.RAA12907@salsa.gv.tsc.tdk.com>
Date: Thu, 24 Apr 1997 17:48:12 -0700
In-Reply-To: David Greenman <dg@root.com>
       "Re: the namei cache..." (Apr 24,  5:44pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: dg@root.com, Don Lewis <Don.Lewis@tsc.tdk.com>
Subject: Re: the namei cache...
Cc: fs@freebsd.org
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Apr 24,  5:44pm, David Greenman wrote:
} Subject: Re: the namei cache...
} >Also, the component name can be terminated with either a NUL or a /.
} 
}    Really? That's a surprise if true.

This is right from the source:

	cnp->cn_hash = 0;
	for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++)
		cnp->cn_hash += (unsigned char)*cp;
	cnp->cn_namelen = cp - cnp->cn_nameptr;

It sure looks to me like it stops on '/' ;-)

Also, even if you totally ditch the hash, you still need the loop
to find the end of the component.

			---  Truck

From owner-freebsd-fs  Thu Apr 24 17:51:13 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA13567
          for fs-outgoing; Thu, 24 Apr 1997 17:51:13 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA13555
          for <fs@freebsd.org>; Thu, 24 Apr 1997 17:51:10 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id RAA12150; Thu, 24 Apr 1997 17:52:41 -0700 (PDT)
Message-Id: <199704250052.RAA12150@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Don Lewis <Don.Lewis@tsc.tdk.com>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 17:48:12 PDT."
             <199704250048.RAA12907@salsa.gv.tsc.tdk.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Thu, 24 Apr 1997 17:52:41 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>On Apr 24,  5:44pm, David Greenman wrote:
>} Subject: Re: the namei cache...
>} >Also, the component name can be terminated with either a NUL or a /.
>} 
>}    Really? That's a surprise if true.
>
>This is right from the source:
>
>	cnp->cn_hash = 0;
>	for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++)
>		cnp->cn_hash += (unsigned char)*cp;
>	cnp->cn_namelen = cp - cnp->cn_nameptr;
>
>It sure looks to me like it stops on '/' ;-)
>
>Also, even if you totally ditch the hash, you still need the loop
>to find the end of the component.

   Good point, although I guess you could always waste a "null" longword at
the end and trigger on that. Hmmm.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Thu Apr 24 20:09:06 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id UAA19924
          for fs-outgoing; Thu, 24 Apr 1997 20:09:06 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA19916
          for <fs@freebsd.org>; Thu, 24 Apr 1997 20:09:04 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id DAA16880; Fri, 25 Apr 1997 03:08:45 GMT
Date: Fri, 25 Apr 1997 12:08:45 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: David Greenman <dg@root.com>
cc: Don Lewis <Don.Lewis@tsc.tdk.com>, fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <199704250052.RAA12150@root.com>
Message-ID: <Pine.SV4.3.95.970425114918.16713A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 24 Apr 1997, David Greenman wrote:

> >This is right from the source:
> >
> >	cnp->cn_hash = 0;
> >	for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++)
> >		cnp->cn_hash += (unsigned char)*cp;
> >	cnp->cn_namelen = cp - cnp->cn_nameptr;
> >
> >It sure looks to me like it stops on '/' ;-)
> >
> >Also, even if you totally ditch the hash, you still need the loop
> >to find the end of the component.
> 
>    Good point, although I guess you could always waste a "null" longword at
> the end and trigger on that. Hmmm.

An improved hash looks like low hanging fruit that's worth trying.

Regarding the name list of the vnode approach, it might be interesting try
phk's patches and see how it works out with something like WorldStone. 
The approach to the name cache / vnode cache coherency problem he's trying
to solve makes sense logically. 

However, addressing the performance needs of news servers or large
/var/mail directories might lead to a lot of complications.  It's like you
need some kind of dynamic hash table creation capability for each
directory vnode.  I don't think this is practical. 

Regards,


Mike


From owner-freebsd-fs  Thu Apr 24 22:37:10 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id WAA25664
          for fs-outgoing; Thu, 24 Apr 1997 22:37:10 -0700 (PDT)
Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id WAA25658;
          Thu, 24 Apr 1997 22:37:02 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id HAA01934;
	Fri, 25 Apr 1997 07:36:03 +0200 (CEST)
To: dg@root.com
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
From: Poul-Henning Kamp <phk@dk.tfs.com>
Subject: Re: the namei cache... 
In-reply-to: Your message of "Thu, 24 Apr 1997 13:59:29 PDT."
             <199704242059.NAA11021@root.com> 
Date: Fri, 25 Apr 1997 07:36:02 +0200
Message-ID: <1932.861946562@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

In message <199704242059.NAA11021@root.com>, David Greenman writes:

>   I've increased kern.maxvnodes which I think should have the same effect.
>This actually made things worse - a few percent lower cache hit rate. Very
>odd. It might have just been a statistical anomoly. In any case, it definately
>didn't improve the hit rate.

No, it doesn't have the same effect.  You want to change the ratio between
namecache entries and vnodes, not the absolute number.

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com           TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.

From owner-freebsd-fs  Fri Apr 25 01:07:59 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id BAA01214
          for fs-outgoing; Fri, 25 Apr 1997 01:07:59 -0700 (PDT)
Received: from haldjas.folklore.ee (Haldjas.folklore.ee [193.40.6.121])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA01181
          for <fs@freebsd.org>; Fri, 25 Apr 1997 01:06:55 -0700 (PDT)
Received: from localhost (narvi@localhost)
          by haldjas.folklore.ee (8.8.4/8.8.4) with SMTP
	  id LAA27386; Fri, 25 Apr 1997 11:08:56 +0300 (EEST)
Date: Fri, 25 Apr 1997 11:08:55 +0300 (EEST)
From: Narvi <narvi@haldjas.folklore.ee>
Reply-To: Narvi <narvi@haldjas.folklore.ee>
To: David Greenman <dg@root.com>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, fs@freebsd.org
Subject: Re: the namei cache... 
In-Reply-To: <199704241934.MAA10488@root.com>
Message-ID: <Pine.BSF.3.95.970425103425.26643A-100000@haldjas.folklore.ee>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk



On Thu, 24 Apr 1997, David Greenman wrote:

> >about 1/4th of the additions. Getting reed of the div wouldn't also be bad
> >but might not be worth it.
> 
>    Not worth it? At greater than 50 cycles, the divide is more expensive
> than the entire add loop.

Well, it seems that division has become more expensive over time. :-( 
The only processor related book I had around was about the 286 and there
it cost only 28 so I figured we might be hard pressed coming up something
as good while not taking up more than 3/4 of the cycles (= nearly not
worth). How many bits do we exactly need? If we use 32 bit additions
instead of 8 bit, we could take perhaps take some bits from each byte. For
say 16bit value we could do for taking the lower 4 bits:

hashvalue=((sum & 0x0f000000)>>24) | ((sum & 0x000f0000)>>12) | 
((sum & 0x00000f00)<<4) | ((sum & 0x0000000f)<<12)

Yes, it isn't probably a good way for derviving a hash from a string.
On the positive side of things - when compiled, it will work on registers
and only the fastest (or so they used to be) of operations, logic is used.

Taking every other bit ((sum & 0x55550000)>>8 | (sum & 0x00005555) is
another way. But I am not sure how they will fare in the real world -
probably quite bad.

	Sander

PS. There really are places where such hashes work. 

> 
> -DG
> 
> David Greenman
> Core-team/Principal Architect, The FreeBSD Project
> 



From owner-freebsd-fs  Fri Apr 25 01:35:33 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id BAA02318
          for fs-outgoing; Fri, 25 Apr 1997 01:35:33 -0700 (PDT)
Received: from root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA02311
          for <fs@freebsd.org>; Fri, 25 Apr 1997 01:35:30 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id BAA13794; Fri, 25 Apr 1997 01:35:51 -0700 (PDT)
Message-Id: <199704250835.BAA13794@root.com>
X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol
To: Narvi <narvi@haldjas.folklore.ee>
cc: fs@freebsd.org
Subject: Re: the namei cache... 
In-reply-to: Your message of "Fri, 25 Apr 1997 11:08:55 +0300."
             <Pine.BSF.3.95.970425103425.26643A-100000@haldjas.folklore.ee> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Fri, 25 Apr 1997 01:35:51 -0700
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>On Thu, 24 Apr 1997, David Greenman wrote:
>
>> >about 1/4th of the additions. Getting reed of the div wouldn't also be bad
>> >but might not be worth it.
>> 
>>    Not worth it? At greater than 50 cycles, the divide is more expensive
>> than the entire add loop.
>
>Well, it seems that division has become more expensive over time. :-( 
>The only processor related book I had around was about the 286 and there
>it cost only 28 so I figured we might be hard pressed coming up something

   Occording to my Intel Pentium databook, a 32bit divide takes 41 cycles
on a Pentium; the 386 databook says 41 cycles for a 386.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-fs  Fri Apr 25 09:37:55 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id JAA25828
          for fs-outgoing; Fri, 25 Apr 1997 09:37:55 -0700 (PDT)
Received: from critter.dk.tfs.com ([140.145.230.252])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA25810
          for <fs@freebsd.org>; Fri, 25 Apr 1997 09:37:48 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id SAA00232
	for <fs@freebsd.org>; Fri, 25 Apr 1997 18:36:20 +0200 (CEST)
To: fs@freebsd.org
Subject: patch to be benchmarked.
From: Poul-Henning Kamp <phk@dk.tfs.com>
Date: Fri, 25 Apr 1997 18:36:20 +0200
Message-ID: <230.861986180@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


This patch removes all the ".." entries from the namei cache, and
instead stores the information in the vnode.

This is a "freebie" since we allocate 128 bytes for the vnode and only
use less of it currently.

Benchmark results will be most appreciated !

Poul-Henning

Index: kern/vfs_cache.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.24
diff -u -r1.24 vfs_cache.c
--- vfs_cache.c	1997/03/08 15:22:14	1.24
+++ vfs_cache.c	1997/04/25 16:23:28
@@ -78,7 +78,13 @@
 	(&nchashtbl[((dvp)->v_id + (cnp)->cn_hash) % nchash])
 static LIST_HEAD(nchashhead, namecache) *nchashtbl;	/* Hash Table */
 static u_long	nchash;			/* size of hash table */
+
 static u_long	numcache;		/* number of cache entries allocated */
+SYSCTL_INT(_debug, OID_AUTO, vfscachesize, CTLFLAG_RD, &numcache, 0, "");
+
+static u_long  cachelimit;
+SYSCTL_INT(_debug, OID_AUTO, vfscachelimit, CTLFLAG_RW, &cachelimit, 0, "");
+
 static TAILQ_HEAD(, namecache) nclruhead;	/* LRU chain */
 struct	nchstats nchstats;		/* cache effectiveness statistics */
 
@@ -151,6 +157,18 @@
 		return (0);
 	}
 
+	if (cnp->cn_namelen == 2 &&
+	    cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') {
+		if (dvp->v_dd->v_id != dvp->v_ddid) 
+			return (0);
+		if ((cnp->cn_flags & MAKEENTRY) == 0) {
+			dvp->v_dd = dvp;
+			return (0);
+		}
+		*vpp = dvp->v_dd;
+		return (-1);
+	}
+
 	ncpp = NCHHASH(dvp, cnp);
 	for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) {
 		nnp = ncp->nc_hash.le_next;
@@ -230,12 +248,18 @@
 		return;
 	}
 
+	if (cnp->cn_namelen == 2 &&
+	    cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') {
+		dvp->v_dd = vp;
+		dvp->v_ddid = vp->v_id;
+	}
+
 	/*
 	 * We allocate a new entry if we are less than the maximum
 	 * allowed and the one at the front of the LRU list is in use.
 	 * Otherwise we use the one at the front of the LRU list.
 	 */
-	if (numcache < desiredvnodes &&
+	if (numcache < (cachelimit ? cachelimit : desiredvnodes) &&
 	    ((ncp = nclruhead.tqh_first) == NULL ||
 	    ncp->nc_hash.le_prev != 0)) {
 		/* Add one more entry */
Index: kern/vfs_subr.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.82
diff -u -r1.82 vfs_subr.c
--- vfs_subr.c	1997/04/04 17:46:16	1.82
+++ vfs_subr.c	1997/04/25 16:04:31
@@ -361,6 +361,7 @@
 		vp = (struct vnode *) malloc((u_long) sizeof *vp,
 		    M_VNODE, M_WAITOK);
 		bzero((char *) vp, sizeof *vp);
+		vp->v_dd = vp;		/* Always point to some vnode */
 		numvnodes++;
 	} else {
 		for (vp = vnode_free_list.tqh_first;
Index: sys/vnode.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/vnode.h,v
retrieving revision 1.43
diff -u -r1.43 vnode.h
--- vnode.h	1997/04/04 17:43:32	1.43
+++ vnode.h	1997/04/25 15:55:55
@@ -110,6 +110,8 @@
 	struct	lock *v_vnlock;			/* used for non-locking fs's */
 	enum	vtagtype v_tag;			/* type of underlying data */
 	void 	*v_data;			/* private data for fs */
+	struct vnode *v_dd;			/* ".." vnode */
+	u_long	v_ddid;				/* ".." vnode.v_id */
 };
 #define	v_mountedhere	v_un.vu_mountedhere
 #define	v_socket	v_un.vu_socket


--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com	    TRW Financial Systems, Inc.
Future will arrive by its own means, progress not so.

From owner-freebsd-fs  Fri Apr 25 09:55:31 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id JAA26833
          for fs-outgoing; Fri, 25 Apr 1997 09:55:31 -0700 (PDT)
Received: from critter.dk.tfs.com ([140.145.230.252])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA26828
          for <fs@freebsd.org>; Fri, 25 Apr 1997 09:55:28 -0700 (PDT)
Received: from critter (localhost [127.0.0.1])
	by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id SAA00254
	for <fs@freebsd.org>; Fri, 25 Apr 1997 18:54:01 +0200 (CEST)
To: fs@freebsd.org
Subject: oops...
From: Poul-Henning Kamp <phk@dk.tfs.com>
Date: Fri, 25 Apr 1997 18:54:00 +0200
Message-ID: <252.861987240@critter>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


I just found a .rej file :-(

Here is the patch again, this time with all the bits.

Sorry!

Poul-Henning


Index: kern/vfs_cache.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.24
diff -u -r1.24 vfs_cache.c
--- vfs_cache.c	1997/03/08 15:22:14	1.24
+++ vfs_cache.c	1997/04/25 16:44:16
@@ -78,7 +78,13 @@
 	(&nchashtbl[((dvp)->v_id + (cnp)->cn_hash) % nchash])
 static LIST_HEAD(nchashhead, namecache) *nchashtbl;	/* Hash Table */
 static u_long	nchash;			/* size of hash table */
+
 static u_long	numcache;		/* number of cache entries allocated */
+SYSCTL_INT(_debug, OID_AUTO, vfscachesize, CTLFLAG_RD, &numcache, 0, "");
+
+static u_long  cachelimit;
+SYSCTL_INT(_debug, OID_AUTO, vfscachelimit, CTLFLAG_RW, &cachelimit, 0, "");
+
 static TAILQ_HEAD(, namecache) nclruhead;	/* LRU chain */
 struct	nchstats nchstats;		/* cache effectiveness statistics */
 
@@ -151,6 +157,18 @@
 		return (0);
 	}
 
+	if (cnp->cn_namelen == 2 &&
+	    cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') {
+		if (dvp->v_dd->v_id != dvp->v_ddid) 
+			return (0);
+		if ((cnp->cn_flags & MAKEENTRY) == 0) {
+			dvp->v_dd = dvp;
+			return (0);
+		}
+		*vpp = dvp->v_dd;
+		return (-1);
+	}
+
 	ncpp = NCHHASH(dvp, cnp);
 	for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) {
 		nnp = ncp->nc_hash.le_next;
@@ -230,12 +248,21 @@
 		return;
 	}
 
+	if (cnp->cn_namelen == 2 &&
+	    cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') {
+		if (!vp)
+			return;
+		dvp->v_dd = vp;
+		dvp->v_ddid = vp->v_id;
+		return;
+	}
+
 	/*
 	 * We allocate a new entry if we are less than the maximum
 	 * allowed and the one at the front of the LRU list is in use.
 	 * Otherwise we use the one at the front of the LRU list.
 	 */
-	if (numcache < desiredvnodes &&
+	if (numcache < (cachelimit ? cachelimit : desiredvnodes) &&
 	    ((ncp = nclruhead.tqh_first) == NULL ||
 	    ncp->nc_hash.le_prev != 0)) {
 		/* Add one more entry */
@@ -304,6 +331,7 @@
 	struct nchashhead *ncpp;
 	static u_long nextvnodeid;
 
+	vp->v_ddid = 0;
 	vp->v_id = ++nextvnodeid;
 	if (nextvnodeid != 0)
 		return;
Index: kern/vfs_subr.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.82
diff -u -r1.82 vfs_subr.c
--- vfs_subr.c	1997/04/04 17:46:16	1.82
+++ vfs_subr.c	1997/04/25 16:04:31
@@ -361,6 +361,7 @@
 		vp = (struct vnode *) malloc((u_long) sizeof *vp,
 		    M_VNODE, M_WAITOK);
 		bzero((char *) vp, sizeof *vp);
+		vp->v_dd = vp;		/* Always point to some vnode */
 		numvnodes++;
 	} else {
 		for (vp = vnode_free_list.tqh_first;
Index: sys/vnode.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/vnode.h,v
retrieving revision 1.43
diff -u -r1.43 vnode.h
--- vnode.h	1997/04/04 17:43:32	1.43
+++ vnode.h	1997/04/25 15:55:55
@@ -110,6 +110,8 @@
 	struct	lock *v_vnlock;			/* used for non-locking fs's */
 	enum	vtagtype v_tag;			/* type of underlying data */
 	void 	*v_data;			/* private data for fs */
+	struct vnode *v_dd;			/* ".." vnode */
+	u_long	v_ddid;				/* ".." vnode.v_id */
 };
 #define	v_mountedhere	v_un.vu_mountedhere
 #define	v_socket	v_un.vu_socket


--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@tfs.com	    TRW Financial Systems, Inc.
Future will arrive by its own means, progress not so.

From owner-freebsd-fs  Sat Apr 26 04:02:04 1997
Return-Path: <owner-fs>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id EAA18648
          for fs-outgoing; Sat, 26 Apr 1997 04:02:04 -0700 (PDT)
Received: from wall.jhs.no_domain (vector.muc.ditec.de [194.120.126.35])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA18601;
          Sat, 26 Apr 1997 04:01:33 -0700 (PDT)
Received: from wall.jhs.no_domain (localhost [127.0.0.1]) by wall.jhs.no_domain (8.8.5/8.6.9) with ESMTP id BAA05626; Sat, 26 Apr 1997 01:43:22 +0200 (MET DST)
Message-Id: <199704252343.BAA05626@wall.jhs.no_domain>
To: "Reini (Reinhold Huber)" (Reinhold Huber) <Reinhold.Huber@Fachschaften.TU-Muenchen.DE>
cc: freebsd-fs@freebsd.org
Subject: Re: Recreating destroyed filesystem 
From: "Julian H. Stacey" <jhs@freebsd.org>
Reply-To: "Julian H. Stacey" <jhs@freebsd.org>
X-Email: jhs@freebsd.org,	Fallback: jhs@gil.physik.rwth-aachen.de
X-Web: http://www.freebsd.org/~jhs/
X-Company: Vector Systems Ltd,  Unix & Internet Consultants.
X-Address: Holz Strasse 27d, 80469 Munich, Germany
X-Tel: Phone +49.89.268616,  Fax +49.89.2608126,  Data +49.89.26023276
X-Software: FreeBSD (Unix) + EXMH 1.6.9 (PGP key on web)
In-reply-to: Your message of "Tue, 22 Apr 1997 18:29:35 +0200."
		<Pine.ULT.3.91.970422181716.6176E-100000@boogie.fachschaften.tu-muenchen.de> 
Date: Sat, 26 Apr 1997 01:43:21 +0200
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hi, Reference:
> From:		"Reini (Reinhold Huber)" <Reinhold.Huber@Fachschaften.TU-Muenchen.DE (Reinhold Huber)> 
> Subject:	Re: Recreating destroyed filesystem 
> Date:		Tue, 22 Apr 1997 18:29:35 +0200 (MET) 
>
> Hi!
> 
> My problem: I've screwed up my disk by typing
> 'dd if=/dev/zero of=/dev/rsd0c count=20', 
> which was not at all healthy to the contens of my disk. I should really 
> have typed 'sd1' instead of 'sd0' :(.
> Now I've jumpered SCSI ID 0 to ID 1 and installed FreeBSD on another disk
> (now SCSI ID 0 / sd0).
> The manpages of dumpfs and fs tell me that I need to find a superblock.
> The questions:
> 
> * Does the info the superblock contains allow to find the beginning of 
>   the partition?

No, Disklabel does (within a slice).

> * How do I find the superblock?
> * What is where in the superblock?

Read /usr/include/sys/  disklabel.h diskslice.h

> Aim is to get my user files back, and then using this disk for 
> 2.2[.1]-RELEASE.
> 
> I habe been using FreeBSD 2.1.0-RELEASE on the machine, installed by
> the /stand/sysinstall utility, using the default partition layout
> computed before I upgraded memory from 16 to 48 MB.
> 
> Any info, including hints on manpages, webpages, FAQs, ..., is valuable 
> for me.

I would think you could just do something like 
	disklabel -r -w sd1 same_old_label_name_as_before-Or_cloned_fake
	fsck -b  32
but I suggest doing a whole disc dd to tape first, to avoid errors,
something along the lines of
	dd if=/dev/rsd1c of=/dev/rst0 bs=10k conv=sync
(but not rsd1c if disk has fdisk type slices ?)

> Thanks in advance,
> Reinhold Huber

Phone me to discuss it, if you want, 089 268616  I'm also in Munich like you !

Julian
--
Julian H. Stacey	jhs@freebsd.org  	http://www.freebsd.org/~jhs/