From owner-freebsd-fs Tue Apr 22 09:29:44 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id JAA23004 for fs-outgoing; Tue, 22 Apr 1997 09:29:44 -0700 (PDT) Received: from fssrv.fachschaften.tu-muenchen.de (fssrv.fachschaften.tu-muenchen.de [129.187.176.20]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id JAA22998 for ; Tue, 22 Apr 1997 09:29:40 -0700 (PDT) Received: from boogie.fachschaften.tu-muenchen.de by fssrv.fachschaften.tu-muenchen.de with SMTP id AA05199 (5.67a8/IDA-1.5 for ); Tue, 22 Apr 1997 17:29:49 +0200 Received: by boogie.fachschaften.tu-muenchen.de (5.67a8/Ultrix3.0-C) id AA06445; Tue, 22 Apr 1997 18:29:36 +0200 Date: Tue, 22 Apr 1997 18:29:35 +0200 (MET) From: "Reini (Reinhold Huber)" To: freebsd-fs@freebsd.org Subject: Recreating destroyed filesystem Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi! My problem: I've screwed up my disk by typing 'dd if=/dev/zero of=/dev/rsd0c count=20', which was not at all healthy to the contens of my disk. I should really have typed 'sd1' instead of 'sd0' :(. Now I've jumpered SCSI ID 0 to ID 1 and installed FreeBSD on another disk (now SCSI ID 0 / sd0). The manpages of dumpfs and fs tell me that I need to find a superblock. The questions: * Does the info the superblock contains allow to find the beginning of the partition? * How do I find the superblock? * What is where in the superblock? Aim is to get my user files back, and then using this disk for 2.2[.1]-RELEASE. I habe been using FreeBSD 2.1.0-RELEASE on the machine, installed by the /stand/sysinstall utility, using the default partition layout computed before I upgraded memory from 16 to 48 MB. Any info, including hints on manpages, webpages, FAQs, ..., is valuable for me. Thanks in advance, Reinhold Huber From owner-freebsd-fs Wed Apr 23 08:57:07 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id IAA06489 for fs-outgoing; Wed, 23 Apr 1997 08:57:07 -0700 (PDT) Received: from critter.dk.tfs.com ([140.145.230.252]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id IAA06483 for ; Wed, 23 Apr 1997 08:57:05 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id RAA01286 for ; Wed, 23 Apr 1997 17:55:34 +0200 (CEST) To: fs@freebsd.org Subject: the namei cache... From: Poul-Henning Kamp Date: Wed, 23 Apr 1997 17:55:34 +0200 Message-ID: <1284.861810934@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've been sticking my nose in the namei cache again, and that got me wondering. Consider this: We add a linked list to vnodes that represent directories and hook all the cache entires that are relative to that directory onto that list (sorted by name, or maybe LRU ?), rather than finding them through the hash list. Benefit: We can release the cache entries at the time we recycle the vnode, instead of the lazy release when we stumble across an invalid cache entry, resulting in a better cache utilization. Benefit: We can do away with the v_id field, unless other parts of the kernel need it. Benefit: We don't rely on the hash to level things out for us. Benefit: We can lock with finer and more logical granularity (ie. per vnode) in the cache than we could otherwise do. Neutral: We still keep the LRU list around to reclain little used cache entries. Almost neutral: We save the storage for the hash-table but the vnode becomes 4 bytes bigger. This is close to a break even, since the size of the hash table is found from desiredvnodes... Comments ? Have I overlooked something fundamental ? -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Future will arrive by its own means, progress not so. From owner-freebsd-fs Thu Apr 24 03:13:56 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id DAA28568 for fs-outgoing; Thu, 24 Apr 1997 03:13:56 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA28563 for ; Thu, 24 Apr 1997 03:13:53 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id KAA13250; Thu, 24 Apr 1997 10:13:44 GMT Date: Thu, 24 Apr 1997 19:13:44 +0900 (JST) From: Michael Hancock To: Poul-Henning Kamp cc: fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <1284.861810934@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Wed, 23 Apr 1997, Poul-Henning Kamp wrote: > > I've been sticking my nose in the namei cache again, and that got me > wondering. > Consider this: > > We add a linked list to vnodes that represent directories > and hook all the cache entires that are relative to that > directory onto that list (sorted by name, or maybe LRU ?), > rather than finding them through the hash list. > Benefit: > We can release the cache entries at the time we recycle the > vnode, instead of the lazy release when we stumble across > an invalid cache entry, resulting in a better cache utilization. Why can't this be done now? > Benefit: > We can do away with the v_id field, unless other parts of > the kernel need it. That would be nice. Our vnodes are pretty big, 112 bytes? > Benefit: > We don't rely on the hash to level things out for us. What's wrong with relying on the hash? > > Benefit: > We can lock with finer and more logical granularity (ie. per vnode) > in the cache than we could otherwise do. It doesn't give you logical vnode granularity, but you could lock on buckets. > Neutral: > We still keep the LRU list around to reclain little used > cache entries. > > Almost neutral: > We save the storage for the hash-table but the vnode becomes 4 > bytes bigger. This is close to a break even, since the size > of the hash table is found from desiredvnodes... > I think hashing is still a better way of doing it. Regards, Mike Hancock From owner-freebsd-fs Thu Apr 24 03:27:56 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id DAA29106 for fs-outgoing; Thu, 24 Apr 1997 03:27:56 -0700 (PDT) Received: from critter.dk.tfs.com ([140.145.230.252]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA29070; Thu, 24 Apr 1997 03:27:47 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id MAA00738; Thu, 24 Apr 1997 12:26:21 +0200 (CEST) To: Michael Hancock cc: Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 19:13:44 +0900." Date: Thu, 24 Apr 1997 12:26:21 +0200 Message-ID: <736.861877581@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >> Benefit: >> We can release the cache entries at the time we recycle the >> vnode, instead of the lazy release when we stumble across >> an invalid cache entry, resulting in a better cache utilization. > >Why can't this be done now? we would have to traverse either the all the hash lists, or the LRU lists to locate the entries. >> Benefit: >> We can do away with the v_id field, unless other parts of >> the kernel need it. > >That would be nice. Our vnodes are pretty big, 112 bytes? +64 bytes in the namecache structure :-( >> Benefit: >> We don't rely on the hash to level things out for us. > >What's wrong with relying on the hash? It's not needed. >> Benefit: >> We can lock with finer and more logical granularity (ie. per vnode) >> in the cache than we could otherwise do. > >It doesn't give you logical vnode granularity, but you could lock on >buckets. Which will lock out some other random number of vnodes (ie directories), where as the new scheme will lock only (the already locked directory we're looking in. -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 03:39:22 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id DAA29626 for fs-outgoing; Thu, 24 Apr 1997 03:39:22 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA29621 for ; Thu, 24 Apr 1997 03:39:20 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id KAA13310; Thu, 24 Apr 1997 10:39:12 GMT Date: Thu, 24 Apr 1997 19:39:12 +0900 (JST) From: Michael Hancock To: Poul-Henning Kamp cc: fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <736.861877581@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, Poul-Henning Kamp wrote: > >What's wrong with relying on the hash? > > It's not needed. How would the link list version fare under degenerate cases like large directories? With hashing you can work on the hashing algorithm. Btw, what is the hash key now? vp+name[20]? Mike From owner-freebsd-fs Thu Apr 24 03:44:02 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id DAA29815 for fs-outgoing; Thu, 24 Apr 1997 03:44:02 -0700 (PDT) Received: from critter.dk.tfs.com ([140.145.230.252]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA29781; Thu, 24 Apr 1997 03:42:49 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id MAA00803; Thu, 24 Apr 1997 12:41:18 +0200 (CEST) To: Michael Hancock cc: Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 19:39:12 +0900." Date: Thu, 24 Apr 1997 12:41:18 +0200 Message-ID: <801.861878478@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message , Mich ael Hancock writes: >On Thu, 24 Apr 1997, Poul-Henning Kamp wrote: > >> >What's wrong with relying on the hash? >> >> It's not needed. > >How would the link list version fare under degenerate cases like large >directories? yes, for those directories. >With hashing you can work on the hashing algorithm. Btw, what is the hash >key now? vp+name[20]? directory vnode v_id + the regular namei hash. -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 04:49:23 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id EAA02464 for fs-outgoing; Thu, 24 Apr 1997 04:49:23 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA02459 for ; Thu, 24 Apr 1997 04:49:21 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id LAA13640; Thu, 24 Apr 1997 11:49:14 GMT Date: Thu, 24 Apr 1997 20:49:14 +0900 (JST) From: Michael Hancock To: Poul-Henning Kamp cc: fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <801.861878478@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, Poul-Henning Kamp wrote: > >With hashing you can work on the hashing algorithm. Btw, what is the hash > >key now? vp+name[20]? > > directory vnode v_id + the regular namei hash. It's hard to tell how well keys are distributed. The cn_hash component seems like it can be improved, maybe by taking only 3 or 4 of the characters in the name and then doing some shifts and xor's on them. cnp->cn_hash = 0; for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++) cnp->cn_hash += (unsigned char)*cp; cnp->cn_namelen = cp - cnp->cn_nameptr; if (cnp->cn_namelen > NAME_MAX) { error = ENAMETOOLONG; goto bad; } Regarding the capability have you done any analysis on it's distribution? Mike From owner-freebsd-fs Thu Apr 24 05:07:22 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id FAA03014 for fs-outgoing; Thu, 24 Apr 1997 05:07:22 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id FAA03009 for ; Thu, 24 Apr 1997 05:07:19 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id FAA09111; Thu, 24 Apr 1997 05:08:55 -0700 (PDT) Message-Id: <199704241208.FAA09111@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Poul-Henning Kamp cc: Michael Hancock , fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 12:41:18 +0200." <801.861878478@critter> From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 05:08:55 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >>With hashing you can work on the hashing algorithm. Btw, what is the hash >>key now? vp+name[20]? > >directory vnode v_id + the regular namei hash. Calculating the hash is expensive since it involves doing a byte sum of all of the characters in the name, adding in the directory v_id, and dividing by a prime number. There's got to be a better way. I haven't carefully read what Poul-Henning is proposing, but the gist I got was that he intends to hang the file name cache entries off of the directory vnode? It seems to me that this works well for small directories and poorly for large directories unless you construct a binary tree and keep the entries sorted. Then the problem becomes managing the tree (keeping it balanced, etc), which itself can be expensive...although lookups happen far more often than creates and deletes. It's interesting that the single largest CPU consumer on wcarchive appears to be the namei cache lookups. I think the hash algorithm needs to be re-visited at the very least, perhaps changing the divide by a prime into some sort of xor of the filename characters (xored in pairs as int16's). -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 05:30:20 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id FAA03876 for fs-outgoing; Thu, 24 Apr 1997 05:30:20 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id FAA03867 for ; Thu, 24 Apr 1997 05:30:18 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id MAA13844; Thu, 24 Apr 1997 12:30:02 GMT Date: Thu, 24 Apr 1997 21:30:02 +0900 (JST) From: Michael Hancock To: David Greenman cc: Poul-Henning Kamp , fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <199704241208.FAA09111@root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, David Greenman wrote: > It's interesting that the single largest CPU consumer on wcarchive appears > to be the namei cache lookups. I think the hash algorithm needs to be > re-visited at the very least, perhaps changing the divide by a prime into > some sort of xor of the filename characters (xored in pairs as int16's). I don't think you need to even look at all the characters. You can take the first and last and couple in the middle. To get the middle ones, instead of dividing the length by 2 do a bit shift on the length. Then xor the 2 pairs. Any hash guru's on the list? Mike Hancock From owner-freebsd-fs Thu Apr 24 05:56:13 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id FAA04755 for fs-outgoing; Thu, 24 Apr 1997 05:56:13 -0700 (PDT) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id FAA04750 for ; Thu, 24 Apr 1997 05:56:10 -0700 (PDT) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id FAA20232; Thu, 24 Apr 1997 05:56:03 -0700 (PDT) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id FAA13671; Thu, 24 Apr 1997 05:56:03 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id FAA11674; Thu, 24 Apr 1997 05:56:01 -0700 (PDT) From: Don Lewis Message-Id: <199704241256.FAA11674@salsa.gv.tsc.tdk.com> Date: Thu, 24 Apr 1997 05:56:01 -0700 In-Reply-To: David Greenman "Re: the namei cache..." (Apr 24, 5:08am) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: dg@root.com, Poul-Henning Kamp Subject: Re: the namei cache... Cc: Michael Hancock , fs@freebsd.org Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, 5:08am, David Greenman wrote: } Subject: Re: the namei cache... } >>With hashing you can work on the hashing algorithm. Btw, what is the hash } >>key now? vp+name[20]? } > } >directory vnode v_id + the regular namei hash. } } Calculating the hash is expensive since it involves doing a byte sum of } all of the characters in the name, adding in the directory v_id, and dividing } by a prime number. It shouldn't be too bad because of the locality, so there aren't many cache misses. } There's got to be a better way. I haven't carefully read } what Poul-Henning is proposing, but the gist I got was that he intends to } hang the file name cache entries off of the directory vnode? It seems to } me that this works well for small directories and poorly for large } directories unless you construct a binary tree and keep the entries sorted. Doing lots of pointer chasing is slow ... } Then the problem becomes managing the tree (keeping it balanced, etc), which } itself can be expensive...although lookups happen far more often than creates } and deletes. At the filesystem level there are far more lookups, but as uncached files are looked up, other entries have to be kicked out of the cache, so you'll end up with a much higher rate of addition and deletion of entries in the cache than in the file system. } It's interesting that the single largest CPU consumer on wcarchive appears } to be the namei cache lookups. It might be interesting to know how long the hash chains are and how well balanced they are. Is the table size appropriate for for the number of vnodes that are in use? } I think the hash algorithm needs to be } re-visited at the very least, perhaps changing the divide by a prime into } some sort of xor of the filename characters (xored in pairs as int16's). Yes, most processors are really bad at integer divide. You might look at the Perl hash. You multiply the current value by 33 and add the next character, then mask off however many bits you want at the end (the table size must be a power of 2). If your compiler is at all smart, it will notice that the multiply is just a shift and an add. I suspect this might be pretty easy to just drop in to the existing code for a quick trial. --- Truck From owner-freebsd-fs Thu Apr 24 06:34:19 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id GAA06296 for fs-outgoing; Thu, 24 Apr 1997 06:34:19 -0700 (PDT) Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06122; Thu, 24 Apr 1997 06:28:44 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id PAA00322; Thu, 24 Apr 1997 15:27:45 +0200 (CEST) To: Michael Hancock cc: Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 20:49:14 +0900." Date: Thu, 24 Apr 1997 15:27:45 +0200 Message-ID: <320.861888465@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message , Mich ael Hancock writes: >On Thu, 24 Apr 1997, Poul-Henning Kamp wrote: > >> >With hashing you can work on the hashing algorithm. Btw, what is the hash >> >key now? vp+name[20]? >> >> directory vnode v_id + the regular namei hash. > >It's hard to tell how well keys are distributed. The cn_hash component >seems like it can be improved, maybe by taking only 3 or 4 of the >characters in the name and then doing some shifts and xor's on them. not terribly well according to my study, but well enough for this purpose. >Regarding the capability have you done any analysis on it's distribution? It's a global counter, it's only added to avoid all the ".." or "CVS" entries ending up in the same hash-bucket. You are on the wrong lead here. The time or the quality of the hash is not terribly important, considering that what we save is a diskaccess or two. What matters is using the limited (see below) number of cache entries all the time. The problem is that with the current implementation, we have stale entries in the LRU chain, but they're not at the end where we happen to grab the next one (this happens when a vnode gets recycled). Therefore we will purge good entries out to add new data with some regularity. Linking to the vnodes will avoid this problem, since we can purge the entries right away when we recycle the vnode. I've actually implemented it this morning, if you want to try it out, here is the patch. You need to "sysctl -w debug.vfscache" to start the caching, I've disabled it by default while I play. Another thing, you can set the size of the name cache now with sysctl sysctl -w debug.vfscachelimit=3000 If the number is zero, the limit is "desiredvnodes", which as far as I can see isn't enough for sane operation. (think "." and ".." ...) Remember that they cost 64bytes a piece btw. I intend to run a couple of benchmarks on this during the next week or so. Any data you can add will be most welcome. Poul-Henning PS: You need some of the additions to which I'm working on, so that's included in the patch too. Index: sys/namei.h =================================================================== RCS file: /home/ncvs/src/sys/sys/namei.h,v retrieving revision 1.13 diff -u -r1.13 namei.h --- namei.h 1997/02/22 09:45:38 1.13 +++ namei.h 1997/04/24 12:27:02 @@ -165,10 +165,9 @@ #endif struct namecache { - LIST_ENTRY(namecache) nc_hash; /* hash chain */ - TAILQ_ENTRY(namecache) nc_lru; /* LRU chain */ - struct vnode *nc_dvp; /* vnode of parent of name */ - u_long nc_dvpid; /* capability number of nc_dvp */ + LIST_ENTRY(namecache) nc_hash_src; /* hash chain */ + LIST_ENTRY(namecache) nc_hash_dst; /* hash chain */ + TAILQ_ENTRY(namecache) nc_lru; /* LRU chain */ struct vnode *nc_vp; /* vnode the name refers to */ u_long nc_vpid; /* capability number of nc_vp */ #ifdef NCH_STATISTICS @@ -180,9 +179,6 @@ }; #ifdef KERNEL -extern u_long nextvnodeid; -extern u_long numcache; -extern u_long numvnodes; int namei __P((struct nameidata *ndp)); int lookup __P((struct nameidata *ndp)); Index: sys/queue.h =================================================================== RCS file: /home/ncvs/src/sys/sys/queue.h,v retrieving revision 1.14 diff -u -r1.14 queue.h --- queue.h 1997/04/14 18:22:02 1.14 +++ queue.h 1997/04/23 11:29:51 @@ -85,6 +85,25 @@ * complex end of list detection. * * For details on the use of these macros, see the queue(3) manual page. + * + * + * SLIST LIST STAILQ TAILQ CIRCLEQ + * _HEAD + + + + + + * _ENTRY + + + + + + * _INIT + + + + + + * _EMPTY + + + + + + * _FIRST + + - + + + * _NEXT + + - + + + * _PREV - - - + + + * _LAST - - - + + + * _FOREACH - + - + - + * _INSERT_HEAD + + + + + + * _INSERT_BEFORE - + - + + + * _INSERT_AFTER + + + + + + * _INSERT_TAIL - - + + + + * _REMOVE_HEAD + - + - - + * _REMOVE + + + + + + * */ /* @@ -157,6 +176,8 @@ /* * Singly-linked Tail queue functions. */ +#define STAILQ_EMPTY(head) ((head)->stqh_first == NULL) + #define STAILQ_INIT(head) { \ (head)->stqh_first = NULL; \ (head)->stqh_last = &(head)->stqh_first; \ @@ -217,6 +238,9 @@ /* * List functions. */ + +#define LIST_EMPTY(head) ((head)->lh_first == NULL) + #define LIST_FIRST(head) ((head)->lh_first) #define LIST_FOREACH(var, head, field) \ @@ -277,6 +301,9 @@ */ #define TAILQ_EMPTY(head) ((head)->tqh_first == NULL) +#define TAILQ_FOREACH(var, head, field) \ + for (var = TAILQ_FIRST(head); var; var = TAILQ_NEXT(var, field)) + #define TAILQ_FIRST(head) ((head)->tqh_first) #define TAILQ_LAST(head) ((head)->tqh_last) @@ -351,6 +378,13 @@ /* * Circular queue functions. */ +#define CIRCLEQ_EMPTY(head) ((head)->cqh_first == (head)->cqh_last) + +#define CIRCLEQ_FIRST(head) ((head)->cqh_first) + +#define CIRCLEQ_FOREACH(var, head, field) \ + for((var) = (head)->cqh_first; (var); (var) = (var)->field.cqe_next) + #define CIRCLEQ_INIT(head) { \ (head)->cqh_first = (void *)(head); \ (head)->cqh_last = (void *)(head); \ @@ -395,6 +429,12 @@ (head)->cqh_last->field.cqe_next = (elm); \ (head)->cqh_last = (elm); \ } + +#define CIRCLEQ_LAST(head) ((head)->cqh_last) + +#define CIRCLEQ_NEXT(elm,field) ((elm)->field.cqe_next) + +#define CIRCLEQ_PREV(elm,field) ((elm)->field.cqe_prev) #define CIRCLEQ_REMOVE(head, elm, field) { \ if ((elm)->field.cqe_next == (void *)(head)) \ Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.43 diff -u -r1.43 vnode.h --- vnode.h 1997/04/04 17:43:32 1.43 +++ vnode.h 1997/04/24 06:17:12 @@ -70,6 +70,7 @@ typedef int vop_t __P((void *)); struct vm_object; +struct namecache; /* * Reading or writing any of these items requires holding the appropriate lock. @@ -110,6 +111,9 @@ struct lock *v_vnlock; /* used for non-locking fs's */ enum vtagtype v_tag; /* type of underlying data */ void *v_data; /* private data for fs */ + LIST_HEAD(, namecache) v_cache_src; /* namecache entries */ + LIST_HEAD(, namecache) v_cache_dst; /* namecache entries */ + int martian; }; #define v_mountedhere v_un.vu_mountedhere #define v_socket v_un.vu_socket Index: kern/vfs_cache.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v retrieving revision 1.24 diff -u -r1.24 vfs_cache.c --- vfs_cache.c 1997/03/08 15:22:14 1.24 +++ vfs_cache.c 1997/04/24 12:12:03 @@ -74,16 +74,15 @@ /* * Structures associated with name cacheing. */ -#define NCHHASH(dvp, cnp) \ - (&nchashtbl[((dvp)->v_id + (cnp)->cn_hash) % nchash]) -static LIST_HEAD(nchashhead, namecache) *nchashtbl; /* Hash Table */ -static u_long nchash; /* size of hash table */ -static u_long numcache; /* number of cache entries allocated */ static TAILQ_HEAD(, namecache) nclruhead; /* LRU chain */ struct nchstats nchstats; /* cache effectiveness statistics */ -static int doingcache = 1; /* 1 => enable the cache */ +static int doingcache = 0; /* 1 => enable the cache */ SYSCTL_INT(_debug, OID_AUTO, vfscache, CTLFLAG_RW, &doingcache, 0, ""); +static u_long numcache; /* number of cache entries allocated */ +SYSCTL_INT(_debug, OID_AUTO, vfscachesize, CTLFLAG_RD, &numcache, 0, ""); +static u_long cachelimit; /* cache size limit */ +SYSCTL_INT(_debug, OID_AUTO, vfscachelimit, CTLFLAG_RW, &cachelimit, 0, ""); #ifdef NCH_STATISTICS u_long nchnbr; @@ -94,15 +93,41 @@ #define NCHHIT(ncp) #endif +static int +sysctl_vfscachedump SYSCTL_HANDLER_ARGS +{ + struct namecache *ncp; + int error; + + /* Estimate only ? */ + if (!req->oldptr) + return (SYSCTL_OUT(req,0,numcache*sizeof(struct namecache))); + + + TAILQ_FOREACH(ncp, &nclruhead, nc_lru) + if (error = SYSCTL_OUT(req, ncp, sizeof *ncp)) + return (error); + return (0); +} + +SYSCTL_PROC(_debug, OID_AUTO, vfscachedump, CTLTYPE_OPAQUE|CTLFLAG_RD, + 0, 0, sysctl_vfscachedump, "S,namecache", ""); + /* - * Delete an entry from its hash list and move it to the front + * Delete an entry from its vnode list and move it to the front * of the LRU list for immediate reuse. */ -#define PURGE(ncp) { \ - LIST_REMOVE(ncp, nc_hash); \ - ncp->nc_hash.le_prev = 0; \ - TAILQ_REMOVE(&nclruhead, ncp, nc_lru); \ - TAILQ_INSERT_HEAD(&nclruhead, ncp, nc_lru); \ +static void +cache_zap(struct namecache *ncp) +{ + if (ncp->nc_hash_src.le_prev) + LIST_REMOVE(ncp, nc_hash_src); + ncp->nc_hash_src.le_prev = 0; + if (ncp->nc_hash_dst.le_prev) + LIST_REMOVE(ncp, nc_hash_dst); + ncp->nc_hash_dst.le_prev = 0; + TAILQ_REMOVE(&nclruhead, ncp, nc_lru); + TAILQ_INSERT_HEAD(&nclruhead, ncp, nc_lru); } /* @@ -139,7 +164,6 @@ struct componentname *cnp; { register struct namecache *ncp, *nnp; - register struct nchashhead *ncpp; if (!doingcache) { cnp->cn_flags &= ~MAKEENTRY; @@ -151,19 +175,10 @@ return (0); } - ncpp = NCHHASH(dvp, cnp); - for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) { - nnp = ncp->nc_hash.le_next; - /* If one of the vp's went stale, don't bother anymore. */ - if ((ncp->nc_dvpid != ncp->nc_dvp->v_id) || - (ncp->nc_vp && ncp->nc_vpid != ncp->nc_vp->v_id)) { - nchstats.ncs_falsehits++; - PURGE(ncp); - continue; - } + for (ncp = dvp->v_cache_src.lh_first; ncp != 0; ncp = nnp) { + nnp = ncp->nc_hash_src.le_next; /* Now that we know the vp's to be valid, is it ours ? */ - if (ncp->nc_dvp == dvp && - ncp->nc_nlen == cnp->cn_namelen && + if (ncp->nc_nlen == cnp->cn_namelen && !bcmp(ncp->nc_name, cnp->cn_nameptr, (u_int)ncp->nc_nlen)) break; } @@ -179,7 +194,7 @@ /* We don't want to have an entry, so dump it */ if ((cnp->cn_flags & MAKEENTRY) == 0) { nchstats.ncs_badhits++; - PURGE(ncp); + cache_zap(ncp); return (0); } @@ -196,7 +211,7 @@ /* We found a negative match, and want to create it, so purge */ if (cnp->cn_nameiop == CREATE) { nchstats.ncs_badhits++; - PURGE(ncp); + cache_zap(ncp); return (0); } @@ -220,7 +235,6 @@ struct componentname *cnp; { register struct namecache *ncp; - register struct nchashhead *ncpp; if (!doingcache) return; @@ -235,21 +249,20 @@ * allowed and the one at the front of the LRU list is in use. * Otherwise we use the one at the front of the LRU list. */ - if (numcache < desiredvnodes && - ((ncp = nclruhead.tqh_first) == NULL || - ncp->nc_hash.le_prev != 0)) { + ncp = nclruhead.tqh_first; + if (ncp && !ncp->nc_hash_src.le_prev) { + /* reuse an old entry */ + TAILQ_REMOVE(&nclruhead, ncp, nc_lru); + } else if (numcache < (cachelimit ? cachelimit : desiredvnodes)) { /* Add one more entry */ ncp = (struct namecache *) malloc((u_long)sizeof *ncp, M_CACHE, M_WAITOK); bzero((char *)ncp, sizeof *ncp); numcache++; - } else if (ncp = nclruhead.tqh_first) { + } else if (ncp) { /* reuse an old entry */ + cache_zap(ncp); TAILQ_REMOVE(&nclruhead, ncp, nc_lru); - if (ncp->nc_hash.le_prev != 0) { - LIST_REMOVE(ncp, nc_hash); - ncp->nc_hash.le_prev = 0; - } } else { /* give up */ return; @@ -268,13 +281,12 @@ ++vp->v_usage; } else ncp->nc_vpid = cnp->cn_flags & ISWHITEOUT; - ncp->nc_dvp = dvp; - ncp->nc_dvpid = dvp->v_id; ncp->nc_nlen = cnp->cn_namelen; bcopy(cnp->cn_nameptr, ncp->nc_name, (unsigned)ncp->nc_nlen); TAILQ_INSERT_TAIL(&nclruhead, ncp, nc_lru); - ncpp = NCHHASH(dvp, cnp); - LIST_INSERT_HEAD(ncpp, ncp, nc_hash); + LIST_INSERT_HEAD(&dvp->v_cache_src, ncp, nc_hash_src); + if (vp) + LIST_INSERT_HEAD(&vp->v_cache_dst, ncp, nc_hash_dst); } /* @@ -285,7 +297,6 @@ { TAILQ_INIT(&nclruhead); - nchashtbl = phashinit(desiredvnodes, M_CACHE, &nchash); } /* @@ -301,15 +312,23 @@ struct vnode *vp; { struct namecache *ncp; - struct nchashhead *ncpp; static u_long nextvnodeid; + if (vp->martian != 0x1007) + Debugger("Martian Vnode!"); + while (!LIST_EMPTY(&vp->v_cache_src)) { + cache_zap(LIST_FIRST(&vp->v_cache_src)); + } + while (!LIST_EMPTY(&vp->v_cache_dst)) { + cache_zap(LIST_FIRST(&vp->v_cache_dst)); + } vp->v_id = ++nextvnodeid; + vp->martian = 0x1007; if (nextvnodeid != 0) return; - for (ncpp = &nchashtbl[nchash - 1]; ncpp >= nchashtbl; ncpp--) { - while (ncp = ncpp->lh_first) - PURGE(ncp); + TAILQ_FOREACH(ncp, &nclruhead, nc_lru) { + if (ncp->nc_hash_src.le_prev) + cache_zap(ncp); } vp->v_id = ++nextvnodeid; } @@ -324,18 +343,13 @@ cache_purgevfs(mp) struct mount *mp; { - struct nchashhead *ncpp; struct namecache *ncp, *nnp; + struct vnode *vnp; - /* Scan hash tables for applicable entries */ - for (ncpp = &nchashtbl[nchash - 1]; ncpp >= nchashtbl; ncpp--) { - for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) { - nnp = ncp->nc_hash.le_next; - if (ncp->nc_dvpid != ncp->nc_dvp->v_id || - (ncp->nc_vp && ncp->nc_vpid != ncp->nc_vp->v_id) || - ncp->nc_dvp->v_mount == mp) { - PURGE(ncp); - } - } + loop: + LIST_FOREACH(vnp, &mp->mnt_vnodelist, v_mntvnodes) { + if (vnp->v_mount != mp) + goto loop; + cache_purge(vnp); } } Index: kern/vfs_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.82 diff -u -r1.82 vfs_subr.c --- vfs_subr.c 1997/04/04 17:46:16 1.82 +++ vfs_subr.c 1997/04/24 12:27:06 @@ -77,7 +77,7 @@ #endif static void vclean __P((struct vnode *vp, int flags, struct proc *p)); static void vgonel __P((struct vnode *vp, struct proc *p)); -unsigned long numvnodes; +static unsigned long numvnodes; extern void vputrele __P((struct vnode *vp, int put)); enum vtype iftovt_tab[16] = { @@ -361,6 +361,9 @@ vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); bzero((char *) vp, sizeof *vp); + LIST_INIT(&vp->v_cache_src); + LIST_INIT(&vp->v_cache_dst); + vp->martian = 0x1007; numvnodes++; } else { for (vp = vnode_free_list.tqh_first; -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 06:42:53 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id GAA06614 for fs-outgoing; Thu, 24 Apr 1997 06:42:53 -0700 (PDT) Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06605; Thu, 24 Apr 1997 06:42:11 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id PAA00334; Thu, 24 Apr 1997 15:41:12 +0200 (CEST) To: dg@root.com cc: Poul-Henning Kamp , Michael Hancock , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 05:08:55 PDT." <199704241208.FAA09111@root.com> Date: Thu, 24 Apr 1997 15:41:12 +0200 Message-ID: <332.861889272@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message <199704241208.FAA09111@root.com>, David Greenman writes: >>>With hashing you can work on the hashing algorithm. Btw, what is the hash >>>key now? vp+name[20]? >> >>directory vnode v_id + the regular namei hash. > > Calculating the hash is expensive since it involves doing a byte sum of >all of the characters in the name, adding in the directory v_id, and dividing >by a prime number. There's got to be a better way. I haven't carefully read >what Poul-Henning is proposing, but the gist I got was that he intends to >hang the file name cache entries off of the directory vnode? It seems to >me that this works well for small directories and poorly for large >directories unless you construct a binary tree and keep the entries sorted. I don't think that the tradeoff to a binary tree will make sense, but it may be worth a try. It's true that my scheme may penalize large directories, but I'm pretty convinced that "large" in this context may be quite a lot larger than one would think. If "large" is anything above 100 I think that we can pretty safely ignore it. It may also pay off to sort the lists by componentname, which will cut the number of comparisons in half. Making it a circleq and searching from whichever end is nearest can cut this further. > It's interesting that the single largest CPU consumer on wcarchive appears >to be the namei cache lookups. I think the hash algorithm needs to be >re-visited at the very least, perhaps changing the divide by a prime into >some sort of xor of the filename characters (xored in pairs as int16's). It may actually be worth optimizing ".." out with special case code. If we added a "parent dir vnode" pointer to the vnodes, ".." can be taken out, and the namecache entries they occupy can be used for something more valuable. Another thing, try setting the size of the namecache to 1.5 * desiredvnodes instead of the 1:1 factor we use now. Ohh, and if you run the code I sent in the other email, please sysctl -bn debug.vfscachedump > somefile and send me that file for analysis. Poul-Henning -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 06:43:55 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id GAA06678 for fs-outgoing; Thu, 24 Apr 1997 06:43:55 -0700 (PDT) Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06664; Thu, 24 Apr 1997 06:43:21 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id PAA00359; Thu, 24 Apr 1997 15:42:24 +0200 (CEST) To: Michael Hancock cc: David Greenman , Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 21:30:02 +0900." Date: Thu, 24 Apr 1997 15:42:23 +0200 Message-ID: <357.861889343@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message , Mich ael Hancock writes: > >> It's interesting that the single largest CPU consumer on wcarchive appear >s >> to be the namei cache lookups. I think the hash algorithm needs to be >> re-visited at the very least, perhaps changing the divide by a prime into >> some sort of xor of the filename characters (xored in pairs as int16's). > >I don't think you need to even look at all the characters. You can take >the first and last and couple in the middle. To get the middle ones, >instead of dividing the length by 2 do a bit shift on the length. Then >xor the 2 pairs. Any hash guru's on the list? I'd rather get rid of it entirely... -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 07:04:26 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id HAA07677 for fs-outgoing; Thu, 24 Apr 1997 07:04:26 -0700 (PDT) Received: from Sisyphos.MI.Uni-Koeln.DE (Sisyphos.MI.Uni-Koeln.DE [134.95.212.10]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id HAA07667 for ; Thu, 24 Apr 1997 07:04:20 -0700 (PDT) Received: from x14.mi.uni-koeln.de (annexr3-1.slip.Uni-Koeln.DE) by Sisyphos.MI.Uni-Koeln.DE with SMTP id AA01677 (5.67b/IDA-1.5 for ); Thu, 24 Apr 1997 16:03:47 +0200 Received: (from se@localhost) by x14.mi.uni-koeln.de (8.8.5/8.6.9) id QAA23373; Thu, 24 Apr 1997 16:03:39 +0200 (CEST) Message-Id: <19970424160338.57282@x14.mi.uni-koeln.de> Date: Thu, 24 Apr 1997 16:03:38 +0200 From: Stefan Esser To: Don Lewis Cc: dg@root.com, Poul-Henning Kamp , Michael Hancock , fs@freebsd.org Subject: Re: the namei cache... References: <199704241256.FAA11674@salsa.gv.tsc.tdk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.68 In-Reply-To: <199704241256.FAA11674@salsa.gv.tsc.tdk.com>; from Don Lewis on Thu, Apr 24, 1997 at 05:56:01AM -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, Don Lewis wrote: > Yes, most processors are really bad at integer divide. You might look at > the Perl hash. You multiply the current value by 33 and add the next > character, then mask off however many bits you want at the end (the table > size must be a power of 2). If your compiler is at all smart, it will > notice that the multiply is just a shift and an add. I suspect this might > be pretty easy to just drop in to the existing code for a quick trial. I used the following as a hash function, after looking up algorithms in Knuth's book and studying lots of examples in several software packages. I needed a fast hash, which should work on power of 2 size tables, with keys mostly being ASCII text. I did quite some profiling and testing with my data sets, and the following gave good results: #define SHVAL (8) #define MULTIPLIER (0x5425daf1) unsigned long hashf (unsigned long hashmax, void *key, size_t keylen) { unsigned long i = 0; unsigned char *p = key; while (keylen--) { unsigned char c = *p++; i -= (i << 4) ^ c; /* i -= (i << 4) + (i >> 23) ^ c;*/ } i *= MULTIPLIER; i >>= SHVAL; return i & hashmax; } This lead to significantly less collisions than other, simpler hash functions. I tried other versions that fed back some of the high order bits, but surprisingly found that this did not help reduce the number of collisions, and thus was not worth the additional shift and add. But this is possibly not true in the case of file names that tend to have a common suffix (.tar.gz, for example). (The value of hashmax must obviously be a power of 2 minus 1 ...) I used tables from a few thousand up to a few hundred thousand entries, BTW. Regards, STefan From owner-freebsd-fs Thu Apr 24 07:58:31 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id HAA10041 for fs-outgoing; Thu, 24 Apr 1997 07:58:31 -0700 (PDT) Received: from nic.follonett.no (nic.follonett.no [194.198.43.10]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id HAA10035 for ; Thu, 24 Apr 1997 07:58:19 -0700 (PDT) Received: (from uucp@localhost) by nic.follonett.no (8.8.5/8.8.3) with UUCP id QAA00870; Thu, 24 Apr 1997 16:55:19 +0200 (MET DST) Received: from oo7 ([192.0.0.136]) by dimaga.com (8.7.5/8.7.2) with SMTP id QAA01966; Thu, 24 Apr 1997 16:07:41 +0200 (MET DST) Message-Id: <3.0.32.19970424150956.01100be0@dimaga.com> X-Sender: eivind@dimaga.com X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Thu, 24 Apr 1997 15:09:59 +0200 To: Michael Hancock From: Eivind Eklund Subject: Re: the namei cache... Cc: fs@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk At 09:30 PM 4/24/97 +0900, Michael Hancock wrote: >On Thu, 24 Apr 1997, David Greenman wrote: > >> It's interesting that the single largest CPU consumer on wcarchive appears >> to be the namei cache lookups. I think the hash algorithm needs to be >> re-visited at the very least, perhaps changing the divide by a prime into >> some sort of xor of the filename characters (xored in pairs as int16's). > >I don't think you need to even look at all the characters. You can take >the first and last and couple in the middle. To get the middle ones, >instead of dividing the length by 2 do a bit shift on the length. Then >xor the 2 pairs. Any hash guru's on the list? Experimenting with this is easy if we pull the hash code out into a seperate file - we can just run it against a find on a 'normal machine'. I don't know how much just adding the last and first characters will buy us, as finding the end of the string still include parsing the entire string. However, replacing the hash algorithm is certainly worthwhile - the present one is both slow and insensitive to interesting changes (e.g. swapping of characters). A speedup could be to use an algorithm that work on words instead of bytes; however, this mean that a read-overrun of up to three bytes _must be tolerable_. This can only work as long as we control the implementation. Testing for zero-termination is fairly easy - the following expression will do it, assuming a 2s complement machine, a 32-bit word, and 8-bit chars. ((w-0x01010101)&~w&0x80808080) Unfortuneatly, a test for both / and NUL is over twice as expensive, involving xors to change / to NUL and double tests :-( I haven't got time to test right now (will see if I can do it later tonight), but that should give anybody interesting in trying enough information to have a go. The original code is in sys/kern/vfs_lookup.c (for anybody that hasn't looked yet) Eivind Eklund perhaps@yes.no http://maybe.yes.no/perhaps/ eivind@freebsd.org From owner-freebsd-fs Thu Apr 24 10:39:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id KAA19851 for fs-outgoing; Thu, 24 Apr 1997 10:39:10 -0700 (PDT) Received: from haldjas.folklore.ee (Haldjas.folklore.ee [193.40.6.121]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA19644 for ; Thu, 24 Apr 1997 10:34:50 -0700 (PDT) Received: from localhost (narvi@localhost) by haldjas.folklore.ee (8.8.4/8.8.4) with SMTP id UAA18267; Thu, 24 Apr 1997 20:24:13 +0300 (EEST) Date: Thu, 24 Apr 1997 20:24:11 +0300 (EEST) From: Narvi To: David Greenman cc: Poul-Henning Kamp , Michael Hancock , fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <199704241208.FAA09111@root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, David Greenman wrote: > >>With hashing you can work on the hashing algorithm. Btw, what is the hash > >>key now? vp+name[20]? > > > >directory vnode v_id + the regular namei hash. > > Calculating the hash is expensive since it involves doing a byte sum of > all of the characters in the name, adding in the directory v_id, and dividing > by a prime number. There's got to be a better way. I haven't carefully read How about not adding byte values but say, long values? We may have to keep upto 3 bytes of additional space (avoiding overuns) but get it done in about 1/4th of the additions. Getting reed of the div wouldn't also be bad but might not be worth it. Sander > what Poul-Henning is proposing, but the gist I got was that he intends to > hang the file name cache entries off of the directory vnode? It seems to > me that this works well for small directories and poorly for large > directories unless you construct a binary tree and keep the entries sorted. > Then the problem becomes managing the tree (keeping it balanced, etc), which > itself can be expensive...although lookups happen far more often than creates > and deletes. > It's interesting that the single largest CPU consumer on wcarchive appears > to be the namei cache lookups. I think the hash algorithm needs to be > re-visited at the very least, perhaps changing the divide by a prime into > some sort of xor of the filename characters (xored in pairs as int16's). > > -DG > > David Greenman > Core-team/Principal Architect, The FreeBSD Project > From owner-freebsd-fs Thu Apr 24 12:33:43 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA25858 for fs-outgoing; Thu, 24 Apr 1997 12:33:43 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA25853 for ; Thu, 24 Apr 1997 12:33:41 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id MAA10488; Thu, 24 Apr 1997 12:34:49 -0700 (PDT) Message-Id: <199704241934.MAA10488@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Narvi cc: Poul-Henning Kamp , fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 20:24:11 +0300." From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 12:34:49 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >about 1/4th of the additions. Getting reed of the div wouldn't also be bad >but might not be worth it. Not worth it? At greater than 50 cycles, the divide is more expensive than the entire add loop. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 12:42:53 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA26373 for fs-outgoing; Thu, 24 Apr 1997 12:42:53 -0700 (PDT) Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA26350; Thu, 24 Apr 1997 12:42:18 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id VAA01305; Thu, 24 Apr 1997 21:40:49 +0200 (CEST) To: dg@root.com cc: Narvi , Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 12:34:49 PDT." <199704241934.MAA10488@root.com> Date: Thu, 24 Apr 1997 21:40:49 +0200 Message-ID: <1303.861910849@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message <199704241934.MAA10488@root.com>, David Greenman writes: >>about 1/4th of the additions. Getting reed of the div wouldn't also be bad >>but might not be worth it. > > Not worth it? At greater than 50 cycles, the divide is more expensive >than the entire add loop. But loosing the hash entirely is >even< faster, right ? :-) -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 13:17:00 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id NAA28354 for fs-outgoing; Thu, 24 Apr 1997 13:17:00 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA28349 for ; Thu, 24 Apr 1997 13:16:57 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id NAA10732; Thu, 24 Apr 1997 13:18:33 -0700 (PDT) Message-Id: <199704242018.NAA10732@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Poul-Henning Kamp cc: fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 21:40:49 +0200." <1303.861910849@critter> From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 13:18:33 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >In message <199704241934.MAA10488@root.com>, David Greenman writes: >>>about 1/4th of the additions. Getting reed of the div wouldn't also be bad >>>but might not be worth it. >> >> Not worth it? At greater than 50 cycles, the divide is more expensive >>than the entire add loop. > >But loosing the hash entirely is >even< faster, right ? :-) Maybe. Here's the problem with your approach as I see it: On wcarchive, we have about 44,000 files in the namei cache. The hash table size is approximately 44,000 entries large (rounded up or down to the nearest prime...I forget). This means that there is an average of about 1 file in each hash bucket...or very little to search through. Of course there are far fewer than 1 directory per file, so now if you change this so that all of the files for a directory are on one list, then you're going to have to do a lot more work to find it. I think the average directory length on wcarchive is about 50 files or so, so this would be about 50 times more entries in the list compared to the old hash table scheme. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 13:40:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id NAA29788 for fs-outgoing; Thu, 24 Apr 1997 13:40:10 -0700 (PDT) Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA29717; Thu, 24 Apr 1997 13:39:53 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id WAA01422; Thu, 24 Apr 1997 22:38:55 +0200 (CEST) To: dg@root.com cc: Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 13:18:33 PDT." <199704242018.NAA10732@root.com> Date: Thu, 24 Apr 1997 22:38:54 +0200 Message-ID: <1420.861914334@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message <199704242018.NAA10732@root.com>, David Greenman writes: >>In message <199704241934.MAA10488@root.com>, David Greenman writes: >>>>about 1/4th of the additions. Getting reed of the div wouldn't also be bad >>>>but might not be worth it. >>> >>> Not worth it? At greater than 50 cycles, the divide is more expensive >>>than the entire add loop. >> >>But loosing the hash entirely is >even< faster, right ? :-) > > Maybe. Here's the problem with your approach as I see it: On wcarchive, >we have about 44,000 files in the namei cache. The hash table size is >approximately 44,000 entries large (rounded up or down to the nearest Unlikely. You have a 44000 entry cache, but as far as I've been able to make out you only have about 40000 >valid< entries in it (if it's anything like the data I have here). I'm trying to get the last 10% back. >prime...I forget). This means that there is an average of about 1 file in >each hash bucket...or very little to search through. Unfortunately not so, finding a hash that good would be really, really, hard. But I agree that we're in the O(3) land. >each hash bucket...or very little to search through. Of course there are >far fewer than 1 directory per file, so now if you change this so that all >of the files for a directory are on one list, then you're going to have to do >a lot more work to find it. I think the average directory length on wcarchive >is about 50 files or so, so this would be about 50 times more entries in the >list compared to the old hash table scheme. This is what I thought, but appearantly not so. Part of the problem seems to be that there are multiple names pointing at the same directory ("foo" + N * "..") and that depletes your name-cache. With the current design it should probably be 1.5 .. 2 times bigger than desiredvnodes. I'm very reluctant to increase it, when entries cost 64 bytes each, and since data seems to indicate that 10% is stale (but we don't know how to find them), so we keep recycling valid entries instead. Another thing that bothers me is the size. The reason for the current size of 64 is the way malloc works. In reality we would get very close to the same efficiency from 48 bytes per entry. I may look into changing the way we allocate them. It would buy us 33% more entries in the same space. David: Can I get you to try one (mostly harmless) experiment on wcarchive ? In vfs_cache.c, where it checks the number of namecache entries against "desiredvnodes" could you try to use 2*desiredvnodes (or something similar) instead, and measure the difference in cache hit rate ? Poul-Henning PS: I guess that I'm doing "the bad thing" here, I'm changing two variables at the same time: 1. changing linkage of namecache entries to permit complete removal (This increases the efficiency of the cache) 2. dropping hash algorithm. (This may make the cache faster/slower depending on POM) But don't worry, until I have understood the impact, I'm not committing anything :-) -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Thu Apr 24 13:57:55 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id NAA00821 for fs-outgoing; Thu, 24 Apr 1997 13:57:55 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id NAA00814 for ; Thu, 24 Apr 1997 13:57:53 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id NAA11021; Thu, 24 Apr 1997 13:59:30 -0700 (PDT) Message-Id: <199704242059.NAA11021@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Poul-Henning Kamp cc: fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 22:38:54 +0200." <1420.861914334@critter> From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 13:59:29 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >This is what I thought, but appearantly not so. Part of the problem seems >to be that there are multiple names pointing at the same directory ("foo" > + N * "..") and that depletes your name-cache. With the current design >it should probably be 1.5 .. 2 times bigger than desiredvnodes. > >I'm very reluctant to increase it, when entries cost 64 bytes each, and >since data seems to indicate that 10% is stale (but we don't know how to >find them), so we keep recycling valid entries instead. > >Another thing that bothers me is the size. The reason for the current >size of 64 is the way malloc works. In reality we would get very close >to the same efficiency from 48 bytes per entry. I may look into changing >the way we allocate them. It would buy us 33% more entries in the same >space. > >David: > Can I get you to try one (mostly harmless) experiment on > wcarchive ? In vfs_cache.c, where it checks the number of > namecache entries against "desiredvnodes" could you try to use > 2*desiredvnodes (or something similar) instead, and measure > the difference in cache hit rate ? I've increased kern.maxvnodes which I think should have the same effect. This actually made things worse - a few percent lower cache hit rate. Very odd. It might have just been a statistical anomoly. In any case, it definately didn't improve the hit rate. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 16:35:29 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id QAA09496 for fs-outgoing; Thu, 24 Apr 1997 16:35:29 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA09489 for ; Thu, 24 Apr 1997 16:35:26 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id XAA15582; Thu, 24 Apr 1997 23:34:56 GMT Date: Fri, 25 Apr 1997 08:34:56 +0900 (JST) From: Michael Hancock To: Poul-Henning Kamp cc: dg@root.com, Narvi , fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <1303.861910849@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, Poul-Henning Kamp wrote: > > Not worth it? At greater than 50 cycles, the divide is more expensive > >than the entire add loop. > > But loosing the hash entirely is >even< faster, right ? :-) Faster than traversing a long linked list? From owner-freebsd-fs Thu Apr 24 16:57:56 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id QAA10460 for fs-outgoing; Thu, 24 Apr 1997 16:57:56 -0700 (PDT) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA10454 for ; Thu, 24 Apr 1997 16:57:53 -0700 (PDT) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id QAA29384; Thu, 24 Apr 1997 16:56:21 -0700 (PDT) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id QAA24329; Thu, 24 Apr 1997 16:56:20 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id QAA12777; Thu, 24 Apr 1997 16:56:19 -0700 (PDT) From: Don Lewis Message-Id: <199704242356.QAA12777@salsa.gv.tsc.tdk.com> Date: Thu, 24 Apr 1997 16:56:18 -0700 In-Reply-To: Narvi "Re: the namei cache..." (Apr 24, 8:24pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: Narvi , David Greenman Subject: Re: the namei cache... Cc: Poul-Henning Kamp , Michael Hancock , fs@freebsd.org Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, 8:24pm, Narvi wrote: } Subject: Re: the namei cache... } } How about not adding byte values but say, long values? We may have to keep } upto 3 bytes of additional space (avoiding overuns) but get it done in } about 1/4th of the additions. The additions are pretty cheap. The problem with adding bigger chunks is that the names aren't word aligned, so for each set of 4 bytes, you have to read two words, extract the desired bytes, and concatenate the results. --- Truck From owner-freebsd-fs Thu Apr 24 17:01:42 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA10796 for fs-outgoing; Thu, 24 Apr 1997 17:01:42 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA10791 for ; Thu, 24 Apr 1997 17:01:40 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id RAA11911; Thu, 24 Apr 1997 17:03:00 -0700 (PDT) Message-Id: <199704250003.RAA11911@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Don Lewis cc: fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 16:56:18 PDT." <199704242356.QAA12777@salsa.gv.tsc.tdk.com> From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 17:03:00 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >On Apr 24, 8:24pm, Narvi wrote: >} Subject: Re: the namei cache... >} >} How about not adding byte values but say, long values? We may have to keep >} upto 3 bytes of additional space (avoiding overuns) but get it done in >} about 1/4th of the additions. > >The additions are pretty cheap. The problem with adding bigger chunks is >that the names aren't word aligned, so for each set of 4 bytes, you have >to read two words, extract the desired bytes, and concatenate the results. Actually, you could read/add in 16bit words since the component name is null terminated...or if the trailing space is zeroed up to a longword boundry, you could add in longwords. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 17:12:24 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA11277 for fs-outgoing; Thu, 24 Apr 1997 17:12:24 -0700 (PDT) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA11271 for ; Thu, 24 Apr 1997 17:12:21 -0700 (PDT) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id RAA00156; Thu, 24 Apr 1997 17:12:00 -0700 (PDT) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA24488; Thu, 24 Apr 1997 17:11:58 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA12811; Thu, 24 Apr 1997 17:11:57 -0700 (PDT) From: Don Lewis Message-Id: <199704250011.RAA12811@salsa.gv.tsc.tdk.com> Date: Thu, 24 Apr 1997 17:11:57 -0700 In-Reply-To: Eivind Eklund "Re: the namei cache..." (Apr 24, 3:09pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: Eivind Eklund , Michael Hancock Subject: Re: the namei cache... Cc: fs@freebsd.org Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, 3:09pm, Eivind Eklund wrote: } Subject: Re: the namei cache... } } However, replacing the hash algorithm is certainly worthwhile - the present } one is both slow and insensitive to interesting changes (e.g. swapping of } characters). I don't necessarily agree with slow (other than the divide at the end). It *might* be worthwhile escapeing from the loop early on names that are too long to cache, but since these are rare, the extra tests might be too much of a penalty in the common case. I think the current hash function does a poor job of distributing the keys, especially if you have a large cache. A directory containing files with names up to 14 characters of all the characters < 128 can only distribute it's entries over at most 1778 hash buckets. The misc.jobs.offered directory on our news server, which contains 28325 files with 7 digit names, will only distribute its entries over 399 buckets. It looks like we're very dependent on the v_id value to give a good global distribution. --- Truck From owner-freebsd-fs Thu Apr 24 17:25:35 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA11712 for fs-outgoing; Thu, 24 Apr 1997 17:25:35 -0700 (PDT) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA11706 for ; Thu, 24 Apr 1997 17:25:33 -0700 (PDT) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id RAA00575; Thu, 24 Apr 1997 17:25:30 -0700 (PDT) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA24790; Thu, 24 Apr 1997 17:25:29 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA12844; Thu, 24 Apr 1997 17:25:28 -0700 (PDT) From: Don Lewis Message-Id: <199704250025.RAA12844@salsa.gv.tsc.tdk.com> Date: Thu, 24 Apr 1997 17:25:27 -0700 In-Reply-To: David Greenman "Re: the namei cache..." (Apr 24, 5:03pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: dg@root.com, Don Lewis Subject: Re: the namei cache... Cc: fs@freebsd.org Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, 5:03pm, David Greenman wrote: } Subject: Re: the namei cache... } Actually, you could read/add in 16bit words since the component name is } null terminated...or if the trailing space is zeroed up to a longword } boundry, you could add in longwords. I was more worried about where the component name starts. It appears that we're extracting it right out of the pathname, so depending on what preceeded this component name, it could be on an odd byte boundary. Also, the component name can be terminated with either a NUL or a /. --- Truck From owner-freebsd-fs Thu Apr 24 17:42:51 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA12857 for fs-outgoing; Thu, 24 Apr 1997 17:42:51 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA12852 for ; Thu, 24 Apr 1997 17:42:48 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id RAA12038; Thu, 24 Apr 1997 17:44:23 -0700 (PDT) Message-Id: <199704250044.RAA12038@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Don Lewis cc: fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 17:25:27 PDT." <199704250025.RAA12844@salsa.gv.tsc.tdk.com> From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 17:44:23 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >Also, the component name can be terminated with either a NUL or a /. Really? That's a surprise if true. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 17:48:22 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA13239 for fs-outgoing; Thu, 24 Apr 1997 17:48:22 -0700 (PDT) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA13226 for ; Thu, 24 Apr 1997 17:48:18 -0700 (PDT) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id RAA01421; Thu, 24 Apr 1997 17:48:15 -0700 (PDT) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA25196; Thu, 24 Apr 1997 17:48:14 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA12907; Thu, 24 Apr 1997 17:48:12 -0700 (PDT) From: Don Lewis Message-Id: <199704250048.RAA12907@salsa.gv.tsc.tdk.com> Date: Thu, 24 Apr 1997 17:48:12 -0700 In-Reply-To: David Greenman "Re: the namei cache..." (Apr 24, 5:44pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: dg@root.com, Don Lewis Subject: Re: the namei cache... Cc: fs@freebsd.org Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Apr 24, 5:44pm, David Greenman wrote: } Subject: Re: the namei cache... } >Also, the component name can be terminated with either a NUL or a /. } } Really? That's a surprise if true. This is right from the source: cnp->cn_hash = 0; for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++) cnp->cn_hash += (unsigned char)*cp; cnp->cn_namelen = cp - cnp->cn_nameptr; It sure looks to me like it stops on '/' ;-) Also, even if you totally ditch the hash, you still need the loop to find the end of the component. --- Truck From owner-freebsd-fs Thu Apr 24 17:51:13 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA13567 for fs-outgoing; Thu, 24 Apr 1997 17:51:13 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA13555 for ; Thu, 24 Apr 1997 17:51:10 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id RAA12150; Thu, 24 Apr 1997 17:52:41 -0700 (PDT) Message-Id: <199704250052.RAA12150@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Don Lewis cc: fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 17:48:12 PDT." <199704250048.RAA12907@salsa.gv.tsc.tdk.com> From: David Greenman Reply-To: dg@root.com Date: Thu, 24 Apr 1997 17:52:41 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >On Apr 24, 5:44pm, David Greenman wrote: >} Subject: Re: the namei cache... >} >Also, the component name can be terminated with either a NUL or a /. >} >} Really? That's a surprise if true. > >This is right from the source: > > cnp->cn_hash = 0; > for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++) > cnp->cn_hash += (unsigned char)*cp; > cnp->cn_namelen = cp - cnp->cn_nameptr; > >It sure looks to me like it stops on '/' ;-) > >Also, even if you totally ditch the hash, you still need the loop >to find the end of the component. Good point, although I guess you could always waste a "null" longword at the end and trigger on that. Hmmm. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Thu Apr 24 20:09:06 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id UAA19924 for fs-outgoing; Thu, 24 Apr 1997 20:09:06 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA19916 for ; Thu, 24 Apr 1997 20:09:04 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.5/CET-v2.1) with SMTP id DAA16880; Fri, 25 Apr 1997 03:08:45 GMT Date: Fri, 25 Apr 1997 12:08:45 +0900 (JST) From: Michael Hancock To: David Greenman cc: Don Lewis , fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <199704250052.RAA12150@root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, David Greenman wrote: > >This is right from the source: > > > > cnp->cn_hash = 0; > > for (cp = cnp->cn_nameptr; *cp != 0 && *cp != '/'; cp++) > > cnp->cn_hash += (unsigned char)*cp; > > cnp->cn_namelen = cp - cnp->cn_nameptr; > > > >It sure looks to me like it stops on '/' ;-) > > > >Also, even if you totally ditch the hash, you still need the loop > >to find the end of the component. > > Good point, although I guess you could always waste a "null" longword at > the end and trigger on that. Hmmm. An improved hash looks like low hanging fruit that's worth trying. Regarding the name list of the vnode approach, it might be interesting try phk's patches and see how it works out with something like WorldStone. The approach to the name cache / vnode cache coherency problem he's trying to solve makes sense logically. However, addressing the performance needs of news servers or large /var/mail directories might lead to a lot of complications. It's like you need some kind of dynamic hash table creation capability for each directory vnode. I don't think this is practical. Regards, Mike From owner-freebsd-fs Thu Apr 24 22:37:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id WAA25664 for fs-outgoing; Thu, 24 Apr 1997 22:37:10 -0700 (PDT) Received: from critter.dk.tfs.com (phk.cybercity.dk [195.8.129.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id WAA25658; Thu, 24 Apr 1997 22:37:02 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id HAA01934; Fri, 25 Apr 1997 07:36:03 +0200 (CEST) To: dg@root.com cc: Poul-Henning Kamp , fs@freebsd.org From: Poul-Henning Kamp Subject: Re: the namei cache... In-reply-to: Your message of "Thu, 24 Apr 1997 13:59:29 PDT." <199704242059.NAA11021@root.com> Date: Fri, 25 Apr 1997 07:36:02 +0200 Message-ID: <1932.861946562@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message <199704242059.NAA11021@root.com>, David Greenman writes: > I've increased kern.maxvnodes which I think should have the same effect. >This actually made things worse - a few percent lower cache hit rate. Very >odd. It might have just been a statistical anomoly. In any case, it definately >didn't improve the hit rate. No, it doesn't have the same effect. You want to change the ratio between namecache entries and vnodes, not the absolute number. -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-fs Fri Apr 25 01:07:59 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id BAA01214 for fs-outgoing; Fri, 25 Apr 1997 01:07:59 -0700 (PDT) Received: from haldjas.folklore.ee (Haldjas.folklore.ee [193.40.6.121]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA01181 for ; Fri, 25 Apr 1997 01:06:55 -0700 (PDT) Received: from localhost (narvi@localhost) by haldjas.folklore.ee (8.8.4/8.8.4) with SMTP id LAA27386; Fri, 25 Apr 1997 11:08:56 +0300 (EEST) Date: Fri, 25 Apr 1997 11:08:55 +0300 (EEST) From: Narvi Reply-To: Narvi To: David Greenman cc: Poul-Henning Kamp , fs@freebsd.org Subject: Re: the namei cache... In-Reply-To: <199704241934.MAA10488@root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 24 Apr 1997, David Greenman wrote: > >about 1/4th of the additions. Getting reed of the div wouldn't also be bad > >but might not be worth it. > > Not worth it? At greater than 50 cycles, the divide is more expensive > than the entire add loop. Well, it seems that division has become more expensive over time. :-( The only processor related book I had around was about the 286 and there it cost only 28 so I figured we might be hard pressed coming up something as good while not taking up more than 3/4 of the cycles (= nearly not worth). How many bits do we exactly need? If we use 32 bit additions instead of 8 bit, we could take perhaps take some bits from each byte. For say 16bit value we could do for taking the lower 4 bits: hashvalue=((sum & 0x0f000000)>>24) | ((sum & 0x000f0000)>>12) | ((sum & 0x00000f00)<<4) | ((sum & 0x0000000f)<<12) Yes, it isn't probably a good way for derviving a hash from a string. On the positive side of things - when compiled, it will work on registers and only the fastest (or so they used to be) of operations, logic is used. Taking every other bit ((sum & 0x55550000)>>8 | (sum & 0x00005555) is another way. But I am not sure how they will fare in the real world - probably quite bad. Sander PS. There really are places where such hashes work. > > -DG > > David Greenman > Core-team/Principal Architect, The FreeBSD Project > From owner-freebsd-fs Fri Apr 25 01:35:33 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id BAA02318 for fs-outgoing; Fri, 25 Apr 1997 01:35:33 -0700 (PDT) Received: from root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA02311 for ; Fri, 25 Apr 1997 01:35:30 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by root.com (8.8.5/8.6.5) with SMTP id BAA13794; Fri, 25 Apr 1997 01:35:51 -0700 (PDT) Message-Id: <199704250835.BAA13794@root.com> X-Authentication-Warning: implode.root.com: localhost [127.0.0.1] didn't use HELO protocol To: Narvi cc: fs@freebsd.org Subject: Re: the namei cache... In-reply-to: Your message of "Fri, 25 Apr 1997 11:08:55 +0300." From: David Greenman Reply-To: dg@root.com Date: Fri, 25 Apr 1997 01:35:51 -0700 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >On Thu, 24 Apr 1997, David Greenman wrote: > >> >about 1/4th of the additions. Getting reed of the div wouldn't also be bad >> >but might not be worth it. >> >> Not worth it? At greater than 50 cycles, the divide is more expensive >> than the entire add loop. > >Well, it seems that division has become more expensive over time. :-( >The only processor related book I had around was about the 286 and there >it cost only 28 so I figured we might be hard pressed coming up something Occording to my Intel Pentium databook, a 32bit divide takes 41 cycles on a Pentium; the 386 databook says 41 cycles for a 386. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-fs Fri Apr 25 09:37:55 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id JAA25828 for fs-outgoing; Fri, 25 Apr 1997 09:37:55 -0700 (PDT) Received: from critter.dk.tfs.com ([140.145.230.252]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA25810 for ; Fri, 25 Apr 1997 09:37:48 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id SAA00232 for ; Fri, 25 Apr 1997 18:36:20 +0200 (CEST) To: fs@freebsd.org Subject: patch to be benchmarked. From: Poul-Henning Kamp Date: Fri, 25 Apr 1997 18:36:20 +0200 Message-ID: <230.861986180@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk This patch removes all the ".." entries from the namei cache, and instead stores the information in the vnode. This is a "freebie" since we allocate 128 bytes for the vnode and only use less of it currently. Benchmark results will be most appreciated ! Poul-Henning Index: kern/vfs_cache.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v retrieving revision 1.24 diff -u -r1.24 vfs_cache.c --- vfs_cache.c 1997/03/08 15:22:14 1.24 +++ vfs_cache.c 1997/04/25 16:23:28 @@ -78,7 +78,13 @@ (&nchashtbl[((dvp)->v_id + (cnp)->cn_hash) % nchash]) static LIST_HEAD(nchashhead, namecache) *nchashtbl; /* Hash Table */ static u_long nchash; /* size of hash table */ + static u_long numcache; /* number of cache entries allocated */ +SYSCTL_INT(_debug, OID_AUTO, vfscachesize, CTLFLAG_RD, &numcache, 0, ""); + +static u_long cachelimit; +SYSCTL_INT(_debug, OID_AUTO, vfscachelimit, CTLFLAG_RW, &cachelimit, 0, ""); + static TAILQ_HEAD(, namecache) nclruhead; /* LRU chain */ struct nchstats nchstats; /* cache effectiveness statistics */ @@ -151,6 +157,18 @@ return (0); } + if (cnp->cn_namelen == 2 && + cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') { + if (dvp->v_dd->v_id != dvp->v_ddid) + return (0); + if ((cnp->cn_flags & MAKEENTRY) == 0) { + dvp->v_dd = dvp; + return (0); + } + *vpp = dvp->v_dd; + return (-1); + } + ncpp = NCHHASH(dvp, cnp); for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) { nnp = ncp->nc_hash.le_next; @@ -230,12 +248,18 @@ return; } + if (cnp->cn_namelen == 2 && + cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') { + dvp->v_dd = vp; + dvp->v_ddid = vp->v_id; + } + /* * We allocate a new entry if we are less than the maximum * allowed and the one at the front of the LRU list is in use. * Otherwise we use the one at the front of the LRU list. */ - if (numcache < desiredvnodes && + if (numcache < (cachelimit ? cachelimit : desiredvnodes) && ((ncp = nclruhead.tqh_first) == NULL || ncp->nc_hash.le_prev != 0)) { /* Add one more entry */ Index: kern/vfs_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.82 diff -u -r1.82 vfs_subr.c --- vfs_subr.c 1997/04/04 17:46:16 1.82 +++ vfs_subr.c 1997/04/25 16:04:31 @@ -361,6 +361,7 @@ vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); bzero((char *) vp, sizeof *vp); + vp->v_dd = vp; /* Always point to some vnode */ numvnodes++; } else { for (vp = vnode_free_list.tqh_first; Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.43 diff -u -r1.43 vnode.h --- vnode.h 1997/04/04 17:43:32 1.43 +++ vnode.h 1997/04/25 15:55:55 @@ -110,6 +110,8 @@ struct lock *v_vnlock; /* used for non-locking fs's */ enum vtagtype v_tag; /* type of underlying data */ void *v_data; /* private data for fs */ + struct vnode *v_dd; /* ".." vnode */ + u_long v_ddid; /* ".." vnode.v_id */ }; #define v_mountedhere v_un.vu_mountedhere #define v_socket v_un.vu_socket -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Future will arrive by its own means, progress not so. From owner-freebsd-fs Fri Apr 25 09:55:31 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id JAA26833 for fs-outgoing; Fri, 25 Apr 1997 09:55:31 -0700 (PDT) Received: from critter.dk.tfs.com ([140.145.230.252]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA26828 for ; Fri, 25 Apr 1997 09:55:28 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.dk.tfs.com (8.8.5/8.8.5) with ESMTP id SAA00254 for ; Fri, 25 Apr 1997 18:54:01 +0200 (CEST) To: fs@freebsd.org Subject: oops... From: Poul-Henning Kamp Date: Fri, 25 Apr 1997 18:54:00 +0200 Message-ID: <252.861987240@critter> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I just found a .rej file :-( Here is the patch again, this time with all the bits. Sorry! Poul-Henning Index: kern/vfs_cache.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v retrieving revision 1.24 diff -u -r1.24 vfs_cache.c --- vfs_cache.c 1997/03/08 15:22:14 1.24 +++ vfs_cache.c 1997/04/25 16:44:16 @@ -78,7 +78,13 @@ (&nchashtbl[((dvp)->v_id + (cnp)->cn_hash) % nchash]) static LIST_HEAD(nchashhead, namecache) *nchashtbl; /* Hash Table */ static u_long nchash; /* size of hash table */ + static u_long numcache; /* number of cache entries allocated */ +SYSCTL_INT(_debug, OID_AUTO, vfscachesize, CTLFLAG_RD, &numcache, 0, ""); + +static u_long cachelimit; +SYSCTL_INT(_debug, OID_AUTO, vfscachelimit, CTLFLAG_RW, &cachelimit, 0, ""); + static TAILQ_HEAD(, namecache) nclruhead; /* LRU chain */ struct nchstats nchstats; /* cache effectiveness statistics */ @@ -151,6 +157,18 @@ return (0); } + if (cnp->cn_namelen == 2 && + cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') { + if (dvp->v_dd->v_id != dvp->v_ddid) + return (0); + if ((cnp->cn_flags & MAKEENTRY) == 0) { + dvp->v_dd = dvp; + return (0); + } + *vpp = dvp->v_dd; + return (-1); + } + ncpp = NCHHASH(dvp, cnp); for (ncp = ncpp->lh_first; ncp != 0; ncp = nnp) { nnp = ncp->nc_hash.le_next; @@ -230,12 +248,21 @@ return; } + if (cnp->cn_namelen == 2 && + cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') { + if (!vp) + return; + dvp->v_dd = vp; + dvp->v_ddid = vp->v_id; + return; + } + /* * We allocate a new entry if we are less than the maximum * allowed and the one at the front of the LRU list is in use. * Otherwise we use the one at the front of the LRU list. */ - if (numcache < desiredvnodes && + if (numcache < (cachelimit ? cachelimit : desiredvnodes) && ((ncp = nclruhead.tqh_first) == NULL || ncp->nc_hash.le_prev != 0)) { /* Add one more entry */ @@ -304,6 +331,7 @@ struct nchashhead *ncpp; static u_long nextvnodeid; + vp->v_ddid = 0; vp->v_id = ++nextvnodeid; if (nextvnodeid != 0) return; Index: kern/vfs_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.82 diff -u -r1.82 vfs_subr.c --- vfs_subr.c 1997/04/04 17:46:16 1.82 +++ vfs_subr.c 1997/04/25 16:04:31 @@ -361,6 +361,7 @@ vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); bzero((char *) vp, sizeof *vp); + vp->v_dd = vp; /* Always point to some vnode */ numvnodes++; } else { for (vp = vnode_free_list.tqh_first; Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.43 diff -u -r1.43 vnode.h --- vnode.h 1997/04/04 17:43:32 1.43 +++ vnode.h 1997/04/25 15:55:55 @@ -110,6 +110,8 @@ struct lock *v_vnlock; /* used for non-locking fs's */ enum vtagtype v_tag; /* type of underlying data */ void *v_data; /* private data for fs */ + struct vnode *v_dd; /* ".." vnode */ + u_long v_ddid; /* ".." vnode.v_id */ }; #define v_mountedhere v_un.vu_mountedhere #define v_socket v_un.vu_socket -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Future will arrive by its own means, progress not so. From owner-freebsd-fs Sat Apr 26 04:02:04 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id EAA18648 for fs-outgoing; Sat, 26 Apr 1997 04:02:04 -0700 (PDT) Received: from wall.jhs.no_domain (vector.muc.ditec.de [194.120.126.35]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA18601; Sat, 26 Apr 1997 04:01:33 -0700 (PDT) Received: from wall.jhs.no_domain (localhost [127.0.0.1]) by wall.jhs.no_domain (8.8.5/8.6.9) with ESMTP id BAA05626; Sat, 26 Apr 1997 01:43:22 +0200 (MET DST) Message-Id: <199704252343.BAA05626@wall.jhs.no_domain> To: "Reini (Reinhold Huber)" (Reinhold Huber) cc: freebsd-fs@freebsd.org Subject: Re: Recreating destroyed filesystem From: "Julian H. Stacey" Reply-To: "Julian H. Stacey" X-Email: jhs@freebsd.org, Fallback: jhs@gil.physik.rwth-aachen.de X-Web: http://www.freebsd.org/~jhs/ X-Company: Vector Systems Ltd, Unix & Internet Consultants. X-Address: Holz Strasse 27d, 80469 Munich, Germany X-Tel: Phone +49.89.268616, Fax +49.89.2608126, Data +49.89.26023276 X-Software: FreeBSD (Unix) + EXMH 1.6.9 (PGP key on web) In-reply-to: Your message of "Tue, 22 Apr 1997 18:29:35 +0200." Date: Sat, 26 Apr 1997 01:43:21 +0200 Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, Reference: > From: "Reini (Reinhold Huber)" > Subject: Re: Recreating destroyed filesystem > Date: Tue, 22 Apr 1997 18:29:35 +0200 (MET) > > Hi! > > My problem: I've screwed up my disk by typing > 'dd if=/dev/zero of=/dev/rsd0c count=20', > which was not at all healthy to the contens of my disk. I should really > have typed 'sd1' instead of 'sd0' :(. > Now I've jumpered SCSI ID 0 to ID 1 and installed FreeBSD on another disk > (now SCSI ID 0 / sd0). > The manpages of dumpfs and fs tell me that I need to find a superblock. > The questions: > > * Does the info the superblock contains allow to find the beginning of > the partition? No, Disklabel does (within a slice). > * How do I find the superblock? > * What is where in the superblock? Read /usr/include/sys/ disklabel.h diskslice.h > Aim is to get my user files back, and then using this disk for > 2.2[.1]-RELEASE. > > I habe been using FreeBSD 2.1.0-RELEASE on the machine, installed by > the /stand/sysinstall utility, using the default partition layout > computed before I upgraded memory from 16 to 48 MB. > > Any info, including hints on manpages, webpages, FAQs, ..., is valuable > for me. I would think you could just do something like disklabel -r -w sd1 same_old_label_name_as_before-Or_cloned_fake fsck -b 32 but I suggest doing a whole disc dd to tape first, to avoid errors, something along the lines of dd if=/dev/rsd1c of=/dev/rst0 bs=10k conv=sync (but not rsd1c if disk has fdisk type slices ?) > Thanks in advance, > Reinhold Huber Phone me to discuss it, if you want, 089 268616 I'm also in Munich like you ! Julian -- Julian H. Stacey jhs@freebsd.org http://www.freebsd.org/~jhs/