From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 01:10:03 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F28E016A4CF for ; Mon, 24 Nov 2003 01:10:02 -0800 (PST) Received: from vbook.fbsd.ru (asplinux.ru [195.133.213.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 912C243FCB for ; Mon, 24 Nov 2003 01:10:00 -0800 (PST) (envelope-from vova@vbook.fbsd.ru) Received: from vova by vbook.fbsd.ru with local (Exim 4.24; FreeBSD) id 1AOCkn-0000E6-LO; Mon, 24 Nov 2003 12:11:33 +0300 From: "Vladimir B. Grebenschikov" To: Erez Zadok In-Reply-To: <200311211559.hALFxOLr015232@agora.fsl.cs.sunysb.edu> References: <200311211559.hALFxOLr015232@agora.fsl.cs.sunysb.edu> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: quoted-printable Organization: SWsoft Inc. Message-Id: <1069665091.806.2.camel@localhost> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.5 Date: Mon, 24 Nov 2003 12:11:32 +0300 Sender: Vladimir Grebenschikov cc: fs@freebsd.org Subject: Re: "Reverse union" mount possible? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2003 09:10:03 -0000 =F7 =D0=D4, 21.11.2003, =D7 18:59, Erez Zadok =D0=C9=DB=C5=D4: > > BTW, I've heard that nullfs/unionfs doesn't allow code sharing. Does wr= apfs do it? >=20 > What do you mean by "code sharing"? Licensing? All of the freebsd fist > templates use the BSD license. I guess he meant that same binary loaded from different unionfs/nullfs mountpoints threated by kernel as different binaris from paging/mmap point of view. (they have different vnodes) > Erez. --=20 Vladimir B. Grebenschikov SWsoft Inc. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 14:07:24 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9DF1216A4CE; Mon, 24 Nov 2003 14:07:24 -0800 (PST) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 18E0843FE1; Mon, 24 Nov 2003 14:07:23 -0800 (PST) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id hAOM0u3g074364; Mon, 24 Nov 2003 14:00:56 -0800 (PST) (envelope-from kmarx@vicor.com) Message-ID: <3FC27F98.8090801@vicor.com> Date: Mon, 24 Nov 2003 14:00:56 -0800 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6a) Gecko/20031105 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Don Lewis References: <200311180347.hAI3lmeF089505@gw.catspoiler.org> In-Reply-To: <200311180347.hAI3lmeF089505@gw.catspoiler.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@FreeBSD.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2003 22:07:24 -0000 Don Lewis wrote: > On 17 Nov, Ken Marx wrote: > >> >>Don Lewis wrote: > > >>>Ok, I'll do the commit as soon as I can do some testing on my -STABLE >>>box. >>> >> >>Great. Please let us know when this happens. In fact, >>I kind of got lost which you were planning to commit. >>Can you point me to it, and I'll do one last overnight run. > > > I just committed version which sets minbfree to: > max(1, avgbfree - avgbfree / 4) > > You may want to continue to use the version that you are already running > which sets minbfree to avgbfree. I'm not committing my more complex > version because it benchmarked worse for me than the version I > committed. > > I'm pretty sure that we can do better than this, but it will require a > fair amount of tweaking and benchmarking, but for now this version > should work a lot better than the previous version of the code. > > >>>>I was able to run a couple more tests here, and *belive* that the >>>>fix to the hash table in vfs_bio.c will provide some relief >>>>for cg block searches when things do fall into the linear search case. >>> >>> >>>I'll see about cranking out patch to use a Fibonacci hash. It'll >>>probably be a little while before I can find sufficient time, though. >>> >> >>Ditto the above: thanks/keep us posted. Our clients are >>anxious to have a 'final' kernel to run with. I think we'll >>just give them what you commit, and sneak the hash fix in with >>the security patch or some such. So, no rush, but do let me >>know if you think it might happen sooner than, say, 2 weeks >>so I can try and get it all in one release to them. > > > I had some time to crank out a patch. Give this a try and compare it to > your hash patch. It hasn't blown up my system, but I don't have any > benchmark data on it. You can just do the test where you fill the > remaining space in the filesystem. You won't need to do a newfs and > start from scratch. It would be great if you could compare the hash > bucket sizes for the different versions of the hash. > > > Index: sys/kern/vfs_bio.c > =================================================================== > RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v > retrieving revision 1.242.2.21 > diff -u -r1.242.2.21 vfs_bio.c > --- sys/kern/vfs_bio.c 9 Aug 2003 16:21:19 -0000 1.242.2.21 > +++ sys/kern/vfs_bio.c 18 Nov 2003 02:10:55 -0000 > @@ -140,6 +140,7 @@ > &bufreusecnt, 0, ""); > > static int bufhashmask; > +static int bufhashshift; > static LIST_HEAD(bufhashhdr, buf) *bufhashtbl, invalhash; > struct bqueues bufqueues[BUFFER_QUEUES] = { { 0 } }; > char *buf_wmesg = BUF_WMESG; > @@ -160,7 +161,20 @@ > struct bufhashhdr * > bufhash(struct vnode *vnp, daddr_t bn) > { > - return(&bufhashtbl[(((uintptr_t)(vnp) >> 7) + (int)bn) & bufhashmask]); > + u_int64_t hashkey64; > + int hashkey; > + > + /* > + * Fibonacci hash, see Knuth's > + * _Art of Computer Programming, Volume 3 / Sorting and Searching_ > + * > + * We reduce the argument to 32 bits before doing the hash to > + * avoid the need for a slow 64x64 multiply on 32 bit platforms. > + */ > + hashkey64 = (u_int64_t)(uintptr_t)vnp + (u_int64_t)bn; > + hashkey = (((u_int32_t)(hashkey64 + (hashkey64 >> 32)) * 2654435769u) >> > + bufhashshift) & bufhashmask; > + return(&bufhashtbl[hashkey]); > } > > /* > @@ -319,8 +333,9 @@ > bufhashinit(caddr_t vaddr) > { > /* first, make a null hash table */ > + bufhashshift = 29; > for (bufhashmask = 8; bufhashmask < nbuf / 4; bufhashmask <<= 1) > - ; > + bufhashshift--; > bufhashtbl = (void *)vaddr; > vaddr = vaddr + sizeof(*bufhashtbl) * bufhashmask; > --bufhashmask; > > Well, I'm mildly beflummoxed - I tried to compare hashtable preformance between all three known versions of the hashing - legacy power of 2, the Vicor ^= hash, and Don's fibonacci hash. Running with minifree = max( 1, avgifree / 4 ); minbfree = max( 1, avgbfree ); all perform about the same, with no performance problems all the way up to 100% disk capacity (didn't test into reserved space). Looking at instrumentation to show freq and avg depth of the hash buckets, everything seems very calm (mainly because we're not hitting the linear searching very often, I'd presume). I can't explain why I seemlingly got performance problems with similar (identical) minbfree code previously. So, out of spite, I went back to minbfree = max( 1, avgbfree/4 ); This does hit the hashtable harder for the legacy version and not so much for either new flavor. Here are a few samplings of calling my dump routine from the debugger. "avgdepth" really means 'search depth' since we use the depth reached after finding a bp in gbincore. The line below such as, 0: avgdepth[1] cnt=801 means that 801 of the hashtable buckets had an avg search depth of 1 at the time the debug routine was called. The 'N:' prefix means the N-th unique non-zero such value. So large cnt's for small []'d depth values means an efficient hash. I've edited out the details as much as possible. LEGACY: -------- Nov 24 13:34:54 oos0b /kernel: bh[442/0x1ba]: freq=2706110, avgdepth = 154 ... Nov 24 13:34:54 oos0b /kernel: 0: avgdepth[1] cnt=1015 Nov 24 13:34:54 oos0b /kernel: 1: avgdepth[2] cnt=7 Nov 24 13:34:54 oos0b /kernel: 2: avgdepth[154] cnt=1 <- !! Nov 24 13:34:54 oos0b /kernel: 3: avgdepth[3] cnt=1 ----------- Nov 24 13:36:49 oos0b /kernel: bh[442/0x1ba]: freq=3416953, avgdepth = 141 ... Nov 24 13:36:49 oos0b /kernel: 0: avgdepth[1] cnt=1017 Nov 24 13:36:49 oos0b /kernel: 1: avgdepth[141] cnt=1 Nov 24 13:36:49 oos0b /kernel: 2: avgdepth[2] cnt=6 VICOR x-or hashtable: --------------------- Nov 24 13:07:24 oos0b /kernel: 0: avgdepth[1] cnt=762 Nov 24 13:07:24 oos0b /kernel: 1: avgdepth[2] cnt=259 Nov 24 13:07:24 oos0b /kernel: 2: avgdepth[3] cnt=3 ----------- Nov 24 13:08:07 oos0b /kernel: 0: avgdepth[1] cnt=744 Nov 24 13:08:07 oos0b /kernel: 1: avgdepth[2] cnt=275 Nov 24 13:08:07 oos0b /kernel: 2: avgdepth[3] cnt=5 FIBONACCI: ---------- Nov 24 11:56:50 oos0b /kernel: 0: avgdepth[1] cnt=811 Nov 24 11:56:50 oos0b /kernel: 1: avgdepth[3] cnt=88 Nov 24 11:56:50 oos0b /kernel: 2: avgdepth[2] cnt=124 Nov 24 11:56:50 oos0b /kernel: 3: avgdepth[0] cnt=1 ----------- Nov 24 11:57:48 oos0b /kernel: 0: avgdepth[1] cnt=801 Nov 24 11:57:48 oos0b /kernel: 1: avgdepth[3] cnt=93 Nov 24 11:57:48 oos0b /kernel: 2: avgdepth[2] cnt=130 So, while this is far from analytically eshaustive, it almost appears the fibonacci hash has more entries of depth 3, while the Vicor one has more at depth 2. I'm happy to run more tests if you have ideas. I'm also fine to cut bait and go with whatever you decide. It *seems* like putting the fibonacci hash is prudent since the current hash has been observed to be expensive. I had trouble proving this unequivocally though. So, perhaps Don's minbfree fix is sufficient after all. I'm tempted at this point to go with the 100% flavor. Apologies for the delays and any confusion, k -- Ken Marx, kmarx@vicor-nb.com If we form a subcomittee we will reach agreement and stop beating around the bush on the bandwith issues. - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 12:03:14 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AD22216A4CE for ; Tue, 25 Nov 2003 12:03:14 -0800 (PST) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1AC6443FE1 for ; Tue, 25 Nov 2003 12:03:13 -0800 (PST) (envelope-from ezk@fsl.cs.sunysb.edu) Received: from agora.fsl.cs.sunysb.edu (IDENT:kKAxUarwB3KkbM9iDWzCUB32hhq31iUR@agora.fsl.cs.sunysb.edu [130.245.126.12])hAPK32Hn028366 for ; Tue, 25 Nov 2003 15:03:02 -0500 Received: from agora.fsl.cs.sunysb.edu (IDENT:vDCGSRndak151TKTLFKR08pIYeYTm+l0@localhost.localdomain [127.0.0.1]) hAPK3Bg9017040; Tue, 25 Nov 2003 15:03:11 -0500 Received: (from ezk@localhost) by agora.fsl.cs.sunysb.edu (8.12.8/8.12.8/Submit) id hAPK3Bb9017036; Tue, 25 Nov 2003 15:03:11 -0500 Date: Tue, 25 Nov 2003 15:03:11 -0500 Message-Id: <200311252003.hAPK3Bb9017036@agora.fsl.cs.sunysb.edu> From: Erez Zadok To: fs@freebsd.org X-MailKey: Erez_Zadok Subject: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2003 20:03:14 -0000 Please see this short thread of discussion on amd-dev. I've included two messages from this thread. It suggests that fbsd5 may have a vnode refcount bug (a vnode isn't held where it should). I've not personally investigated this bug. Does anyone on fs@ has come across such a possible bug? Thanks Erez. ------- Forwarded Message Date: Tue, 25 Nov 2003 18:41:40 +0100 From: scholler@fnb.tu-darmstadt.de (Ulrich Scholler) To: amd-dev@cs.columbia.edu Subject: Re: amd unmounting out from under a cwd Hi, On Tue Nov 25, 2003 at 12:29:27 -0500, Andrew Siegel wrote: > What's happening is: I cd into an automounted directory, > the mount occurs normally, and I leave my shell there. > 5 minutes later, amd unmounts that mount point out from > under me. The shell no longer has a working directory: > > zayin abs % pwd > pwd: .: No such file or directory > > > I have several hundred machines here (Redhat 7.3, IRIX 6.5.12, > FreeBSD 4) running amd (mostly 6.0.9), all with the same > configuration file and maps, and this is the only one that > shows this problem, making me think it's something about > FreeBSD 5.1. > > I'm attaching a debugging log. The directory that is being > mounted and then incorrectly unmounted is /u/abs. Are you sure that your shell's cwd is actually /u/abs? Some programs seem to dereference the symlink and set the cwd to the actual mount point. amd is perfectly right to unmount it, since it is not accessed via the amd-provided symlink. Regards, uLI _______________________________________________ amd-dev mailing list: amd-dev@cs.columbia.edu Am-utils: http://www.am-utils.org ------- End of Forwarded Message ------- Forwarded Message Date: Tue, 25 Nov 2003 11:24:42 -0700 From: John E Hein To: Andrew Siegel Cc: amd-dev@cs.columbia.edu Subject: amd unmounting out from under a cwd Andrew Siegel wrote at 12:29 -0500 on Nov 25: > I've got a problem that I've never seen before with amd > under FreeBSD 5.1. Versions 6.0.7 (as delivered with the > FreeBSD 5.1 distribution) and 6.1b4 (compiled by me) share > this problem. Definitely a FreeBSD 5.* problem. I've noticed it since using early versions of 5. I haven't tracked it down yet since it's been more of an inconvenience than anything (for instance, 'pushd /tmp ; popd' "fixes" it). _______________________________________________ amd-dev mailing list: amd-dev@cs.columbia.edu Am-utils: http://www.am-utils.org ------- End of Forwarded Message From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 13:07:34 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1362316A4CE for ; Tue, 25 Nov 2003 13:07:34 -0800 (PST) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 9DF7043F93 for ; Tue, 25 Nov 2003 13:07:30 -0800 (PST) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 25 Nov 2003 21:07:29 +0000 (GMT) To: Erez Zadok In-Reply-To: Your message of "Tue, 25 Nov 2003 15:03:11 EST." <200311252003.hAPK3Bb9017036@agora.fsl.cs.sunysb.edu> Date: Tue, 25 Nov 2003 21:07:29 +0000 From: Ian Dowse Message-ID: <200311252107.aa96370@salmon.maths.tcd.ie> cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2003 21:07:34 -0000 In message <200311252003.hAPK3Bb9017036@agora.fsl.cs.sunysb.edu>, Erez Zadok wr ites: >Please see this short thread of discussion on amd-dev. I've included two >messages from this thread. It suggests that fbsd5 may have a vnode refcount >bug (a vnode isn't held where it should). > >I've not personally investigated this bug. Does anyone on fs@ has come >across such a possible bug? Hmm, I guess it is caused by checkdirs() in vfs_mount.c moving the process cwd to the underlying vnode before attempting the unmount. Does this only happen if the cwd is at the mount point itself? When a file system is first mounted, checkdirs() looks for processes that had a cwd or chroot set to the vnode that is about to be covered. It moves these processes to the new mountpoint vnode. This behaviour goes back a long time (I'm not sure what the reasons were), but it had the problem that you would get a "Device busy" error if you attempted to unmount the file system later, and a forced unmount would leave the process with a stale cwd or chroot vnode (i.e. "mount /mnt; umount /mnt" would fail if any processes previously had a cwd of /mnt, and "mount /mnt; umount -f /mnt" would cause such processes to lose their reference to the /mnt directory). More recently (Feb 2001), I changed unmount to undo the checkdirs() step so that processes with a cwd or chroot at the mount point get moved back to the covered vnode before the unmount is attempted. This fixes the two issues, but it has the side-effect that if the only vnode references to a file system are processes whose cwd or chroot directory is on the mountpoint, then the unmount will succeed, and those processes will be moved to the underlying directory. The reference count checks could be moved to before checkdirs(), but I think there are cases where the current behaviour is preferable, so maybe it needs to be an unmount() flag... BTW, does amd delete the mountpoint directory after the unmount? That would explain why the directory goes away entirely. Ian From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 13:24:21 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 02F0516A4CE for ; Tue, 25 Nov 2003 13:24:21 -0800 (PST) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E44243FE9 for ; Tue, 25 Nov 2003 13:24:16 -0800 (PST) (envelope-from ezk@fsl.cs.sunysb.edu) Received: from agora.fsl.cs.sunysb.edu (IDENT:uZmDaDLDMapOqQ++SOtAtXyBgQpmMENs@agora.fsl.cs.sunysb.edu [130.245.126.12])hAPLMIHn029059; Tue, 25 Nov 2003 16:22:18 -0500 Received: from agora.fsl.cs.sunysb.edu (IDENT:MbDxfAsrbfpIJzB2czikWlNTDFcDQ1hw@localhost.localdomain [127.0.0.1]) hAPLMRg9018538; Tue, 25 Nov 2003 16:22:27 -0500 Received: (from ezk@localhost) by agora.fsl.cs.sunysb.edu (8.12.8/8.12.8/Submit) id hAPLMRfE018534; Tue, 25 Nov 2003 16:22:27 -0500 Date: Tue, 25 Nov 2003 16:22:27 -0500 Message-Id: <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu> From: Erez Zadok To: Ian Dowse In-reply-to: Your message of "Tue, 25 Nov 2003 21:07:29 GMT." <200311252107.aa96370@salmon.maths.tcd.ie> X-MailKey: Erez_Zadok cc: amd-dev@cs.columbia.edu cc: Erez Zadok cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2003 21:24:21 -0000 Ian, I'm CC-ing my reply to the am-utils developers mailing list, amd-dev. Let's keep this thread on both fs@ and amd-dev for a bit. Can the people on amd-dev who noticed this problem please answer Ian's questions? In message <200311252107.aa96370@salmon.maths.tcd.ie>, Ian Dowse writes: > In message <200311252003.hAPK3Bb9017036@agora.fsl.cs.sunysb.edu>, Erez Zadok wr > ites: > >Please see this short thread of discussion on amd-dev. I've included two > >messages from this thread. It suggests that fbsd5 may have a vnode refcount > >bug (a vnode isn't held where it should). > > > >I've not personally investigated this bug. Does anyone on fs@ has come > >across such a possible bug? > > Hmm, I guess it is caused by checkdirs() in vfs_mount.c moving the > process cwd to the underlying vnode before attempting the unmount. > Does this only happen if the cwd is at the mount point itself? > > When a file system is first mounted, checkdirs() looks for processes > that had a cwd or chroot set to the vnode that is about to be > covered. It moves these processes to the new mountpoint vnode. > This behaviour goes back a long time (I'm not sure what the reasons > were), but it had the problem that you would get a "Device busy" > error if you attempted to unmount the file system later, and a > forced unmount would leave the process with a stale cwd or chroot > vnode (i.e. "mount /mnt; umount /mnt" would fail if any processes > previously had a cwd of /mnt, and "mount /mnt; umount -f /mnt" would > cause such processes to lose their reference to the /mnt directory). > > More recently (Feb 2001), I changed unmount to undo the checkdirs() > step so that processes with a cwd or chroot at the mount point get > moved back to the covered vnode before the unmount is attempted. > This fixes the two issues, but it has the side-effect that if the > only vnode references to a file system are processes whose cwd or > chroot directory is on the mountpoint, then the unmount will succeed, > and those processes will be moved to the underlying directory. Hmmm, yes I think that could be a serious problem (esp. since fbsd doesn't have autofs yet). And I think it deviates from "norms" where a cwd is essentially occupying a vnode within the mounted f/s and therefore the f/s shouldn't be unmounted! This is rather bad for users who sit on an nfs mnt point, ls'ing files happily, and then the kernel unmounts the mnt pt, moves their cwd down to the covered (typically empty) vnode, and the poor user's next /bin/ls shows nothing. Personally, having dealt w/ stackable f/s for a while, I found that when the kernel tries to do all sorts from "under the feet" of the application (or any other upper-layer kernel component), it opens up avenues for trouble. Yes, maybe an un/mount() flag will solve this issue. But I'd like to see the more normal EBUSY-on-cwd behavior restored, and an un/mount flag for those who really want the new behavior. I'm a big proponent of backwards compatibility, and new features gradually introduced through flags/options. And if I want to force an unmount of an mnt pt and I get EBUSY, I do lsof and then /bin/kill any process sitting on the mnt pt; that's expected behavior (what does POSIX say?) > The reference count checks could be moved to before checkdirs(), > but I think there are cases where the current behaviour is preferable, > so maybe it needs to be an unmount() flag... BTW, does amd delete > the mountpoint directory after the unmount? That would explain why > the directory goes away entirely. If Amd created the mount point when it started (say, the mnt pt didn't exist), then Amd will also try to rmdir it upon unmount. > Ian Cheers, Erez. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 15:38:46 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E4C7016A4CE for ; Tue, 25 Nov 2003 15:38:46 -0800 (PST) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 999FA43F93 for ; Tue, 25 Nov 2003 15:38:45 -0800 (PST) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 25 Nov 2003 23:38:45 +0000 (GMT) To: Erez Zadok In-Reply-To: Your message of "Tue, 25 Nov 2003 16:22:27 EST." <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu> Date: Tue, 25 Nov 2003 23:38:44 +0000 From: Ian Dowse Message-ID: <200311252338.aa05451@salmon.maths.tcd.ie> cc: amd-dev@cs.columbia.edu cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2003 23:38:47 -0000 In message <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu>, Erez Zadok wr ites: >Hmmm, yes I think that could be a serious problem (esp. since fbsd doesn't >have autofs yet). And I think it deviates from "norms" where a cwd is >essentially occupying a vnode within the mounted f/s and therefore the f/s >shouldn't be unmounted! This is rather bad for users who sit on an nfs mnt >point, ls'ing files happily, and then the kernel unmounts the mnt pt, moves >their cwd down to the covered (typically empty) vnode, and the poor user's >next /bin/ls shows nothing. Yes, I agree completely - however the question of what to do with references to about-to-be-covered vnodes at mount time still remains. I'll have to look in more detail at why the checkdirs() approach was needed in the first place to see if simply removing it is an option. Any other approaches I can think of right now for solving this issue appear to either extend the original checkdirs() hack, or else just replace one kind of undesirable behaviour with another. Ian From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 15:58:00 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0B4DB16A4CE for ; Tue, 25 Nov 2003 15:58:00 -0800 (PST) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id CB8A743F75 for ; Tue, 25 Nov 2003 15:57:58 -0800 (PST) (envelope-from ezk@fsl.cs.sunysb.edu) Received: from agora.fsl.cs.sunysb.edu (IDENT:YREcBNQLJcxYLWN5xNGmmOApwO5syPLM@agora.fsl.cs.sunysb.edu [130.245.126.12])hAPNvbHn032392; Tue, 25 Nov 2003 18:57:37 -0500 Received: from agora.fsl.cs.sunysb.edu (IDENT:Tzbi/6YTV7V/aCD1Ai3EmmDzjzsaIhqo@localhost.localdomain [127.0.0.1]) hAPNvlg9021313; Tue, 25 Nov 2003 18:57:47 -0500 Received: (from ezk@localhost) by agora.fsl.cs.sunysb.edu (8.12.8/8.12.8/Submit) id hAPNvlGs021309; Tue, 25 Nov 2003 18:57:47 -0500 Date: Tue, 25 Nov 2003 18:57:47 -0500 Message-Id: <200311252357.hAPNvlGs021309@agora.fsl.cs.sunysb.edu> From: Erez Zadok To: Ian Dowse In-reply-to: Your message of "Tue, 25 Nov 2003 23:38:44 GMT." <200311252338.aa05451@salmon.maths.tcd.ie> X-MailKey: Erez_Zadok cc: amd-dev@cs.columbia.edu cc: Erez Zadok cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2003 23:58:00 -0000 In message <200311252338.aa05451@salmon.maths.tcd.ie>, Ian Dowse writes: > In message <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu>, Erez Zadok wr > ites: > >Hmmm, yes I think that could be a serious problem (esp. since fbsd doesn't > >have autofs yet). And I think it deviates from "norms" where a cwd is > >essentially occupying a vnode within the mounted f/s and therefore the f/s > >shouldn't be unmounted! This is rather bad for users who sit on an nfs mnt > >point, ls'ing files happily, and then the kernel unmounts the mnt pt, moves > >their cwd down to the covered (typically empty) vnode, and the poor user's > >next /bin/ls shows nothing. > > Yes, I agree completely - however the question of what to do with > references to about-to-be-covered vnodes at mount time still remains. > I'll have to look in more detail at why the checkdirs() approach > was needed in the first place to see if simply removing it is an > option. If you have a cwd on a lower mnt pt before the mount, I'd say it makes _some_ sense to move it "up" to the mnt pt (root vnode) of the newly mounted fs. This could be very useful for, say, a login shell. I say "some" b/c I'm concerned about the possibility that some bad process (rm -rf) that is just started in an emoty mnt point, all of sudden is moved up to a vnode full of real files, and that process may happily go on to delete the files in the newly mounted f/s. Doing the reverse upon unmount (moving the cwd from upper to lower) sounds even stranger to me. Why? B/c the process used to see some files and now it sees none. Where did it all go? This can break applications in all sorts of unhappy ways. > Any other approaches I can think of right now for solving this issue > appear to either extend the original checkdirs() hack, or else just > replace one kind of undesirable behaviour with another. My personal philosophy when it comes to a choice b/t several un/desirable modes of operations is the following: 1. Offer flags/options/whatever for users to pick their desired behavior. 2. Don't break existing "expected" behavior: make that the default mode of operation. 3. In some cases, it's desirable to change the default behavior to one of the "new modes". But at least everyone will have a way to get the behavior they want. 4. Disadvantage: poor programmers/maintainers have to keep several modes of operation working. The above won't make everyone happy, but it'd maximize the percentage of happy users. > Ian I guess we first need to find out what were the original reasons for the change in fbsd. Maybe we can find a way to accommodate the needs for that change w/o breaking functionality. Cheers, Erez. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 20:28:18 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8EEAB16A4CE for ; Tue, 25 Nov 2003 20:28:18 -0800 (PST) Received: from itree.org (tree.caddev.com [24.153.136.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5619543FE0 for ; Tue, 25 Nov 2003 20:28:15 -0800 (PST) (envelope-from treeml@itree.org) Received: from laptop (user-0cdfduk.cable.mindspring.com [24.215.183.212]) by itree.org (8.11.6/8.11.6) with SMTP id hAQ4WRX00821 for ; Tue, 25 Nov 2003 22:32:28 -0600 From: "treeml" To: Date: Tue, 25 Nov 2003 23:24:18 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Subject: SEARCH FOR ALTERNATE SUPER-BLOCK FAILED X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2003 04:28:18 -0000 My machine is a FreeBSD 5.1-Relase, with either UFS or UFS2 filesystem. I must have switch off the electricity before the machine finishes shutting down. (I did "shutdown -h now", and waited at least 5 mins before I turned off the switch) Now the /usr partition won't mount. In the past 24 hr, I have look all over the Internet, and try all the recommendations. Nothing seems to work. Following are errors I got. The /usr is partition on "/dev/ad0s1f". ----------------------------------- -su-2.05b# mount /dev/ad0s1f /mnt/ ------------------------------------- When I try to fsck the partition I get the following errors, --------------------------------------- bash-2.05b# fsck dev/ad0s1f ** /dev/ad0s1f CANNOT READ BLK: 114411168 CONTINUE? [yn] y THE FOLLOWING DISK SECTORS COULD NOT BE READ: 114411168, 114411169, 114411170, 114411171, LOOK FOR ALTERNATE SUPERBLOCKS? [yn] y 32 is not a file system superblock SEARCH FOR ALTERNATE SUPER-BLOCK FAILED. YOU MUST USE THE -b OPTION TO FSCK TO SPECIFY THE LOCATION OF AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION; SEE fsck(8). bash-2.05b# fsck dev/ad0s1f ** /dev/ad0s1f CANNOT READ BLK: 114411168 CONTINUE? [yn] y THE FOLLOWING DISK SECTORS COULD NOT BE READ: 114411168, 114411169, 114411170, 114411171, LOOK FOR ALTERNATE SUPERBLOCKS? [yn] y 32 is not a file system superblock SEARCH FOR ALTERNATE SUPER-BLOCK FAILED. YOU MUST USE THE -b OPTION TO FSCK TO SPECIFY THE LOCATION OF AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION; SEE fsck(8). --------------------------------------- I have also try, --------------------------------------- dd if=/dev/ad0s1f skip=32 of=/dev/ad0s1f seek=16 bs=512 count=16 --------------------------------------- also no luck. Does anyone know how I can get the parition mounted or just to partially recover the data from that partition? Thanks in advance Tree From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 21:00:00 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E51216A4CE for ; Tue, 25 Nov 2003 21:00:00 -0800 (PST) Received: from Daffy.timing.com (mx2.timing.com [206.168.13.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9AD3B43FBF for ; Tue, 25 Nov 2003 20:59:58 -0800 (PST) (envelope-from jhein@timing.com) Received: from gromit.timing.com (gromit.timing.com [206.168.13.209]) by Daffy.timing.com (8.12.8p2/8.12.8) with ESMTP id hAQ4xjpB036497; Tue, 25 Nov 2003 21:59:45 -0700 (MST) (envelope-from jhein@timing.com) Received: from gromit.timing.com (localhost [127.0.0.1]) by gromit.timing.com (8.12.6p3/8.12.6) with ESMTP id hAQ4xfjh074733; Tue, 25 Nov 2003 21:59:41 -0700 (MST) (envelope-from jhein@gromit.timing.com) Received: (from jhein@localhost) by gromit.timing.com (8.12.6p3/8.12.6/Submit) id hAQ4xfRw074730; Tue, 25 Nov 2003 21:59:41 -0700 (MST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16324.13117.190129.769195@gromit.timing.com> Date: Tue, 25 Nov 2003 21:59:41 -0700 X-Mailer: VM 7.17 under Emacs 21.1.1 From: John E Hein To: Erez Zadok In-Reply-To: <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu> References: <200311252107.aa96370@salmon.maths.tcd.ie> <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu> X-Spam-Status: No, hits=-15.6 required=5.0 tests=IN_REP_TO,REFERENCES,USER_AGENT_VM version=2.50 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) cc: amd-dev@cs.columbia.edu cc: Ian Dowse cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2003 05:00:00 -0000 Erez Zadok wrote at 16:22 -0500 on Nov 25: > Ian, I'm CC-ing my reply to the am-utils developers mailing list, amd-dev. > Let's keep this thread on both fs@ and amd-dev for a bit. > > Can the people on amd-dev who noticed this problem please answer Ian's > questions? > > In message <200311252107.aa96370@salmon.maths.tcd.ie>, Ian Dowse writes: > > In message <200311252003.hAPK3Bb9017036@agora.fsl.cs.sunysb.edu>, Erez Zadok wr > > ites: > > >Please see this short thread of discussion on amd-dev. I've included two > > >messages from this thread. It suggests that fbsd5 may have a vnode refcount > > >bug (a vnode isn't held where it should). > > > > > >I've not personally investigated this bug. Does anyone on fs@ has come > > >across such a possible bug? > > > > Hmm, I guess it is caused by checkdirs() in vfs_mount.c moving the > > process cwd to the underlying vnode before attempting the unmount. > > Does this only happen if the cwd is at the mount point itself? Yes. It appears that's the case. I can force it to happen with amq -u. > > When a file system is first mounted, checkdirs() looks for processes > > that had a cwd or chroot set to the vnode that is about to be > > covered. It moves these processes to the new mountpoint vnode. > > This behaviour goes back a long time (I'm not sure what the reasons > > were), but it had the problem that you would get a "Device busy" > > error if you attempted to unmount the file system later, and a > > forced unmount would leave the process with a stale cwd or chroot > > vnode (i.e. "mount /mnt; umount /mnt" would fail if any processes > > previously had a cwd of /mnt, and "mount /mnt; umount -f /mnt" would > > cause such processes to lose their reference to the /mnt directory). No forced umount is necessary. It just gets unmounted after the amd timeout if you just sit at your shell prompt and wait (or amq -u). > > The reference count checks could be moved to before checkdirs(), > > but I think there are cases where the current behaviour is preferable, > > so maybe it needs to be an unmount() flag... BTW, does amd delete > > the mountpoint directory after the unmount? That would explain why > > the directory goes away entirely. > > If Amd created the mount point when it started (say, the mnt pt didn't > exist), then Amd will also try to rmdir it upon unmount. It gets unmounted first. Then within a minute, it gets deleted. ls returns nothing (but exit code is 0). pwd gives: pwd: .: No such file or directory From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 02:13:42 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9EC3F16A4CE for ; Wed, 26 Nov 2003 02:13:42 -0800 (PST) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 5B2D843FB1 for ; Wed, 26 Nov 2003 02:13:41 -0800 (PST) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 26 Nov 2003 10:13:40 +0000 (GMT) To: Erez Zadok In-Reply-To: Your message of "Tue, 25 Nov 2003 18:57:47 EST." <200311252357.hAPNvlGs021309@agora.fsl.cs.sunysb.edu> Date: Wed, 26 Nov 2003 10:13:39 +0000 From: Ian Dowse Message-ID: <200311261013.aa21508@salmon.maths.tcd.ie> cc: amd-dev@cs.columbia.edu cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2003 10:13:42 -0000 In message <200311252357.hAPNvlGs021309@agora.fsl.cs.sunysb.edu>, Erez Zadok wr ites: >If you have a cwd on a lower mnt pt before the mount, I'd say it makes >_some_ sense to move it "up" to the mnt pt (root vnode) of the newly mounted >fs. This could be very useful for, say, a login shell. > >I say "some" b/c I'm concerned about the possibility that some bad process >(rm -rf) that is just started in an emoty mnt point, all of sudden is moved >up to a vnode full of real files, and that process may happily go on to >delete the files in the newly mounted f/s. > >Doing the reverse upon unmount (moving the cwd from upper to lower) sounds >even stranger to me. Why? B/c the process used to see some files and now >it sees none. Where did it all go? This can break applications in all >sorts of unhappy ways. Whether or not checkdirs() is retained, I think it is just good practice to undo at unmount time anything that was done when the filesystem was mounted. An obvious case is if you accidentally mount a file system in the wrong place or make the common mistake of typing "mount -a" when there are NFS entries in fstab that are already mounted. Without the unmount-time checkdirs call, this is an operation that cannot be undone because any processes that had a cwd of the covered vnode before the mount will lose their cwd entirely if you unmount it. There were also some obscure cases involving booting frem CD and then mounting the real root filesystem directly over /. If you unmount it later, all processes would lose their fd_rdir references to /, so they suddenly become chrooted into a dead vnode even though their original root directory on the CD root still exists. Anyway, I think the best solution for now is to make the checkdirs() at unmount time conditional on the MNT_FORCE flag. This should fix amd's EBUSY detection while still making it possible to fully undo the effects of a mount operation. The change is fairly trivial, so I'll see if I can get something committed before 5.2 is released. Ian From owner-freebsd-fs@FreeBSD.ORG Wed Nov 26 10:35:25 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CDE6D16A4CE for ; Wed, 26 Nov 2003 10:35:25 -0800 (PST) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 85E1043FB1 for ; Wed, 26 Nov 2003 10:35:24 -0800 (PST) (envelope-from ezk@fsl.cs.sunysb.edu) Received: from agora.fsl.cs.sunysb.edu (IDENT:aU2y3hMo8H4hncdHVTSY7dwMiY1HEzym@agora.fsl.cs.sunysb.edu [130.245.126.12])hAQIYuHn017875; Wed, 26 Nov 2003 13:34:56 -0500 Received: from agora.fsl.cs.sunysb.edu (IDENT:aDFs0EZBvK52o3HkWz/NHRiWiX5FjqCp@localhost.localdomain [127.0.0.1]) hAQIZ8g9002674; Wed, 26 Nov 2003 13:35:08 -0500 Received: (from ezk@localhost) by agora.fsl.cs.sunysb.edu (8.12.8/8.12.8/Submit) id hAQIZ8E0002670; Wed, 26 Nov 2003 13:35:08 -0500 Date: Wed, 26 Nov 2003 13:35:08 -0500 Message-Id: <200311261835.hAQIZ8E0002670@agora.fsl.cs.sunysb.edu> From: Erez Zadok To: Ian Dowse In-reply-to: Your message of "Wed, 26 Nov 2003 10:13:39 GMT." <200311261013.aa21508@salmon.maths.tcd.ie> X-MailKey: Erez_Zadok cc: amd-dev@cs.columbia.edu cc: Erez Zadok cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2003 18:35:26 -0000 In message <200311261013.aa21508@salmon.maths.tcd.ie>, Ian Dowse writes: > In message <200311252357.hAPNvlGs021309@agora.fsl.cs.sunysb.edu>, Erez Zadok wr > ites: > >If you have a cwd on a lower mnt pt before the mount, I'd say it makes > >_some_ sense to move it "up" to the mnt pt (root vnode) of the newly mounted > >fs. This could be very useful for, say, a login shell. > > > >I say "some" b/c I'm concerned about the possibility that some bad process > >(rm -rf) that is just started in an emoty mnt point, all of sudden is moved > >up to a vnode full of real files, and that process may happily go on to > >delete the files in the newly mounted f/s. > > > >Doing the reverse upon unmount (moving the cwd from upper to lower) sounds > >even stranger to me. Why? B/c the process used to see some files and now > >it sees none. Where did it all go? This can break applications in all > >sorts of unhappy ways. > > Whether or not checkdirs() is retained, I think it is just good > practice to undo at unmount time anything that was done when the > filesystem was mounted. An obvious case is if you accidentally mount > a file system in the wrong place or make the common mistake of > typing "mount -a" when there are NFS entries in fstab that are > already mounted. Without the unmount-time checkdirs call, this is > an operation that cannot be undone because any processes that had > a cwd of the covered vnode before the mount will lose their cwd > entirely if you unmount it. If you accidentally mount something in the wrong place, you should be able to umount it quickly thereafter; the chance that some new process comes along and "sits" on your cwd is rather rare. And if it happens, you can lsof and kill it, then umount just fine. I don't understand why would a "mount -a" re-mount existing stuff like already-mounted NFS volumes? Does it? It shouldn't IMHO. I agree w/ you that umount should undo anything that a mount did, but I think you may be allowing a mount to proceed in cases that it shouldn't have succeeded; so you first "get yourself in trouble" and then try to find a way to undo it. :-) > There were also some obscure cases involving booting frem CD and > then mounting the real root filesystem directly over /. If you > unmount it later, all processes would lose their fd_rdir references > to /, so they suddenly become chrooted into a dead vnode even though > their original root directory on the CD root still exists. OK, but "obscure cases" shouldn't IMHO change default common behavior. Make the default case the more common one, the one that will be used by most users. You went ahead and changed important behavior for a minority of users. > Anyway, I think the best solution for now is to make the checkdirs() > at unmount time conditional on the MNT_FORCE flag. This should fix > amd's EBUSY detection while still making it possible to fully undo > the effects of a mount operation. The change is fairly trivial, so > I'll see if I can get something committed before 5.2 is released. Thanks. That'd help. I would also hope that the existing cwd-migrating behavior will become the one that someone has to trigger using MNT_FORCE; that is, please make the default behavior be the old behavior (EBUSY and such). Anyone who really wants the new behavior should use MNT_FORCE (I assume there's a flag for it in umount(8) also.) > Ian Cheers, Erez. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 27 15:26:48 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6BCA716A4CE for ; Thu, 27 Nov 2003 15:26:48 -0800 (PST) Received: from mtl.alis.com (mtl.alis.com [199.84.165.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2289543FBD for ; Thu, 27 Nov 2003 15:26:43 -0800 (PST) (envelope-from vgoupil@alis.com) Received: from alis-2k.alis.domain (alis-2k.alis.com [199.84.165.130]) by mtl.alis.com (8.12.8p2/8.12.8) with ESMTP id hARNQfsv073538 for ; Thu, 27 Nov 2003 18:26:41 -0500 (EST) (envelope-from vgoupil@alis.com) Received: by alis-2k.alis.domain with Internet Mail Service (5.5.2653.19) id ; Thu, 27 Nov 2003 18:26:41 -0500 Message-ID: From: Vincent Goupil To: "'freebsd-fs@freebsd.org'" Date: Thu, 27 Nov 2003 18:26:32 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="ISO-8859-1" X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp) Subject: mfs is getting full (/etc/rc.diskless2) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Nov 2003 23:26:48 -0000 Hi, I've setup a firewall with a compact flash instead of a hard-drive. This is the output of mount: /dev/ad0s2a on / (ufs, local, read-only) mfs:17 on /var (mfs, asynchronous, local) procfs on /proc (procfs, local) mfs:36 on /dev (mfs, asynchronous, local) As you see, I mount the compact flash as read-only and I setup a memory filesystem for /var In my rc.conf file: diskless_mount="/etc/rc.diskless2" varsize="131072" Output of: df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s2a 229942 197570 13978 93% / mfs:17 63471 56584 1810 97% /var procfs 4 4 0 100% /proc mfs:36 1503 66 1317 5% /dev Output of: df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad0s2a 225M 193M 14M 93% / mfs:17 62M 55M 1.8M 97% /var procfs 4.0K 4.0K 0B 100% /proc mfs:36 1.5M 66K 1.3M 5% /dev Output of: du -h -d 1 /var 364K /var/db 1.0K /var/account 3.0K /var/at 1.0K /var/backups 1.0K /var/crash 2.0K /var/cron 1.0K /var/empty 5.0K /var/games 1.0K /var/heimdal 3.2M /var/log 31K /var/mail 2.0K /var/msgs 1.0K /var/preserve 47K /var/run 1.0K /var/rwho 16K /var/spool 3.0K /var/tmp 1.0K /var/yp 1.6M /var/mrtg 2.0K /var/ucd-snmp 5.3M /var Output of: du -d 1 /var 364 /var/db 1 /var/account 3 /var/at 1 /var/backups 1 /var/crash 2 /var/cron 1 /var/empty 5 /var/games 1 /var/heimdal 3299 /var/log 31 /var/mail 2 /var/msgs 1 /var/preserve 47 /var/run 1 /var/rwho 16 /var/spool 3 /var/tmp 1 /var/yp 1670 /var/mrtg 2 /var/ucd-snmp 5456 /var It seems to have a big difference between the output of df and du (I know, I read http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#DU-VS-DF ), but it's now explaining everything. I could be the way I setup it. My problem is, my /var partition is getting filled very quickly and I don't know why ? I don't know what to clean. I've already deleted some log, but I saved only 2% of free space or 1000 block. I don't know what is taking all this space ? Any ideas ? Vincent Goupil From owner-freebsd-fs@FreeBSD.ORG Thu Nov 27 18:32:50 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8630C16A4CF for ; Thu, 27 Nov 2003 18:32:50 -0800 (PST) Received: from bilver.wjv.com (user38.net339.fl.sprint-hsd.net [65.40.24.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 065DF43FE0 for ; Thu, 27 Nov 2003 18:32:48 -0800 (PST) (envelope-from bv@bilver.wjv.com) Received: from bilver.wjv.com (localhost.wjv.com [127.0.0.1]) by bilver.wjv.com (8.12.10/8.12.10) with ESMTP id hAS2Wjm7061257 for ; Thu, 27 Nov 2003 21:32:45 -0500 (EST) (envelope-from bv@bilver.wjv.com) Received: (from bv@localhost) by bilver.wjv.com (8.12.10/8.12.10/Submit) id hAS2Wjjl061256 for freebsd-fs@freebsd.org; Thu, 27 Nov 2003 21:32:45 -0500 (EST) (envelope-from bv) Date: Thu, 27 Nov 2003 21:32:45 -0500 From: Bill Vermillion To: freebsd-fs@freebsd.org Message-ID: <20031128023245.GA61208@wjv.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: W.J.Vermillion / Orlando - Winter Park ReplyTo: bv@wjv.com User-Agent: Mutt/1.5.4i X-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.60 X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on bilver.wjv.com Subject: Re: mfs is getting full (/etc/rc.diskless2) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: bv@wjv.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Nov 2003 02:32:50 -0000 Earlier in the linear time track, on approximately Thu, Nov 27, 2003 at 18:26 , Vincent Goupil divulged this public information: > I've setup a firewall with a compact flash instead of a hard-drive. This is > the output of mount: > /dev/ad0s2a on / (ufs, local, read-only) > mfs:17 on /var (mfs, asynchronous, local) > procfs on /proc (procfs, local) > mfs:36 on /dev (mfs, asynchronous, local) > As you see, I mount the compact flash as read-only and I setup a memory > filesystem for /var > In my rc.conf file: > diskless_mount="/etc/rc.diskless2" > varsize="131072" > Output of: df > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ad0s2a 229942 197570 13978 93% / > mfs:17 63471 56584 1810 97% /var > procfs 4 4 0 100% /proc > mfs:36 1503 66 1317 5% /dev > > Output of: df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/ad0s2a 225M 193M 14M 93% / > mfs:17 62M 55M 1.8M 97% /var > procfs 4.0K 4.0K 0B 100% /proc > mfs:36 1.5M 66K 1.3M 5% /dev > > Output of: du -h -d 1 /var > 364K /var/db > 1.0K /var/account > 3.0K /var/at > 1.0K /var/backups > 1.0K /var/crash > 2.0K /var/cron > 1.0K /var/empty > 5.0K /var/games > 1.0K /var/heimdal > 3.2M /var/log > 31K /var/mail > 2.0K /var/msgs > 1.0K /var/preserve > 47K /var/run > 1.0K /var/rwho > 16K /var/spool > 3.0K /var/tmp > 1.0K /var/yp > 1.6M /var/mrtg > 2.0K /var/ucd-snmp > 5.3M /var > > Output of: du -d 1 /var > 364 /var/db > 1 /var/account > 3 /var/at > 1 /var/backups > 1 /var/crash > 2 /var/cron > 1 /var/empty > 5 /var/games > 1 /var/heimdal > 3299 /var/log > 31 /var/mail > 2 /var/msgs > 1 /var/preserve > 47 /var/run > 1 /var/rwho > 16 /var/spool > 3 /var/tmp > 1 /var/yp > 1670 /var/mrtg > 2 /var/ucd-snmp > 5456 /var > It seems to have a big difference between > the output of df and du (I know, I read > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html# > DU-VS-DF ), but it's now explaining everything. I could be the > way I setup it. > My problem is, my /var partition is getting filled very quickly > and I don't know why ? I don't know what to clean. I've already > deleted some log, but I saved only 2% of free space or 1000 > block. > I don't know what is taking all this space ? Any ideas ? It sounds like you deleted some log file that the system keeps open. So it will keep using up disk space even though the name is gone. A file is not deleted until the last link is gone and if the file is opened for loging by a program that never releases the file that is your problem. At that point the easiest way is to reboot. Then find out what you are logging and stop the things you don't need. When a log gets full DO NOT remove it. Null it out. Just doing > should empty the log and reset the pointer back to the first of the file and release all blocks in use. Bill -- Bill Vermillion - bv @ wjv . com From owner-freebsd-fs@FreeBSD.ORG Fri Nov 28 03:07:30 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8DBC216A4CE for ; Fri, 28 Nov 2003 03:07:30 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1653B43FE3 for ; Fri, 28 Nov 2003 03:07:29 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id hASB7BeF017371; Fri, 28 Nov 2003 03:07:15 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200311281107.hASB7BeF017371@gw.catspoiler.org> Date: Fri, 28 Nov 2003 03:07:11 -0800 (PST) From: Don Lewis To: kmarx@vicor.com In-Reply-To: <3FC27F98.8090801@vicor.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-fs@FreeBSD.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Nov 2003 11:07:30 -0000 On 24 Nov, Ken Marx wrote: > > > Don Lewis wrote: >> Index: sys/kern/vfs_bio.c >> =================================================================== >> RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v >> retrieving revision 1.242.2.21 >> diff -u -r1.242.2.21 vfs_bio.c >> --- sys/kern/vfs_bio.c 9 Aug 2003 16:21:19 -0000 1.242.2.21 >> +++ sys/kern/vfs_bio.c 18 Nov 2003 02:10:55 -0000 >> @@ -140,6 +140,7 @@ >> &bufreusecnt, 0, ""); >> >> static int bufhashmask; >> +static int bufhashshift; >> static LIST_HEAD(bufhashhdr, buf) *bufhashtbl, invalhash; >> struct bqueues bufqueues[BUFFER_QUEUES] = { { 0 } }; >> char *buf_wmesg = BUF_WMESG; >> @@ -160,7 +161,20 @@ >> struct bufhashhdr * >> bufhash(struct vnode *vnp, daddr_t bn) >> { >> - return(&bufhashtbl[(((uintptr_t)(vnp) >> 7) + (int)bn) & bufhashmask]); >> + u_int64_t hashkey64; >> + int hashkey; >> + >> + /* >> + * Fibonacci hash, see Knuth's >> + * _Art of Computer Programming, Volume 3 / Sorting and Searching_ >> + * >> + * We reduce the argument to 32 bits before doing the hash to >> + * avoid the need for a slow 64x64 multiply on 32 bit platforms. >> + */ >> + hashkey64 = (u_int64_t)(uintptr_t)vnp + (u_int64_t)bn; >> + hashkey = (((u_int32_t)(hashkey64 + (hashkey64 >> 32)) * 2654435769u) >> >> + bufhashshift) & bufhashmask; >> + return(&bufhashtbl[hashkey]); >> } >> >> /* >> @@ -319,8 +333,9 @@ >> bufhashinit(caddr_t vaddr) >> { >> /* first, make a null hash table */ >> + bufhashshift = 29; >> for (bufhashmask = 8; bufhashmask < nbuf / 4; bufhashmask <<= 1) >> - ; >> + bufhashshift--; >> bufhashtbl = (void *)vaddr; >> vaddr = vaddr + sizeof(*bufhashtbl) * bufhashmask; >> --bufhashmask; >> >> > > Well, I'm mildly beflummoxed - I tried to compare hashtable preformance > between all three known versions of the hashing - legacy power of 2, > the Vicor ^= hash, and Don's fibonacci hash. > > Running with > > minifree = max( 1, avgifree / 4 ); > minbfree = max( 1, avgbfree ); > > all perform about the same, with no performance problems all > the way up to 100% disk capacity (didn't test into reserved space). > > Looking at instrumentation to show freq and avg depth of the > hash buckets, everything seems very calm (mainly because > we're not hitting the linear searching very often, I'd presume). > > I can't explain why I seemlingly got performance problems > with similar (identical) minbfree code previously. > > So, out of spite, I went back to > > minbfree = max( 1, avgbfree/4 ); > > This does hit the hashtable harder for the legacy version > and not so much for either new flavor. Here are a few > samplings of calling my dump routine from the debugger. > "avgdepth" really means 'search depth' since we use > the depth reached after finding a bp in gbincore. > > The line below such as, > > 0: avgdepth[1] cnt=801 > > means that 801 of the hashtable buckets had an avg search > depth of 1 at the time the debug routine was called. > The 'N:' prefix means the N-th unique non-zero such value. > So large cnt's for small []'d depth values means an efficient hash. > > I've edited out the details as much as possible. > > LEGACY: > -------- > Nov 24 13:34:54 oos0b /kernel: bh[442/0x1ba]: freq=2706110, avgdepth = 154 > ... > Nov 24 13:34:54 oos0b /kernel: 0: avgdepth[1] cnt=1015 > Nov 24 13:34:54 oos0b /kernel: 1: avgdepth[2] cnt=7 > Nov 24 13:34:54 oos0b /kernel: 2: avgdepth[154] cnt=1 <- !! > Nov 24 13:34:54 oos0b /kernel: 3: avgdepth[3] cnt=1 > ----------- > > Nov 24 13:36:49 oos0b /kernel: bh[442/0x1ba]: freq=3416953, avgdepth = 141 > ... > Nov 24 13:36:49 oos0b /kernel: 0: avgdepth[1] cnt=1017 > Nov 24 13:36:49 oos0b /kernel: 1: avgdepth[141] cnt=1 > Nov 24 13:36:49 oos0b /kernel: 2: avgdepth[2] cnt=6 > > VICOR x-or hashtable: > --------------------- > Nov 24 13:07:24 oos0b /kernel: 0: avgdepth[1] cnt=762 > Nov 24 13:07:24 oos0b /kernel: 1: avgdepth[2] cnt=259 > Nov 24 13:07:24 oos0b /kernel: 2: avgdepth[3] cnt=3 > ----------- > > Nov 24 13:08:07 oos0b /kernel: 0: avgdepth[1] cnt=744 > Nov 24 13:08:07 oos0b /kernel: 1: avgdepth[2] cnt=275 > Nov 24 13:08:07 oos0b /kernel: 2: avgdepth[3] cnt=5 > > FIBONACCI: > ---------- > Nov 24 11:56:50 oos0b /kernel: 0: avgdepth[1] cnt=811 > Nov 24 11:56:50 oos0b /kernel: 1: avgdepth[3] cnt=88 > Nov 24 11:56:50 oos0b /kernel: 2: avgdepth[2] cnt=124 > Nov 24 11:56:50 oos0b /kernel: 3: avgdepth[0] cnt=1 > ----------- > > Nov 24 11:57:48 oos0b /kernel: 0: avgdepth[1] cnt=801 > Nov 24 11:57:48 oos0b /kernel: 1: avgdepth[3] cnt=93 > Nov 24 11:57:48 oos0b /kernel: 2: avgdepth[2] cnt=130 > > So, while this is far from analytically eshaustive, > it almost appears the fibonacci hash has more entries > of depth 3, while the Vicor one has more at depth 2. > > I'm happy to run more tests if you have ideas. I'm also fine > to cut bait and go with whatever you decide. It *seems* like > putting the fibonacci hash is prudent since the current hash > has been observed to be expensive. I had trouble proving this > unequivocally though. So, perhaps Don's minbfree fix is sufficient > after all. I'm tempted at this point to go with the 100% flavor. I think we're running into one of the weaknesses in the Fibonacci hash. There are a large number of hash entries for the cylinder group blocks, which are located at offsets which are multiples of 89 * 2^10 in your example, or something on the order of 2^16. The effect of this is for the cylinder group number to be hashed using the least significant bits of the hash multiplier, which don't work as well for distributing the hash values. I tried some of Knuth's suggestions, and got better results with the hash multiplier 0x9E376DB1u. The most significant 16 bits of the multplier are the same as the original constant, and the least significant bits act as a fraction in the desirable range of 1/3 to 3/7. Please give this new hash multiplier a try. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 28 07:43:21 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E825B16A4CE for ; Fri, 28 Nov 2003 07:43:21 -0800 (PST) Received: from bilver.wjv.com (user38.net339.fl.sprint-hsd.net [65.40.24.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 265E843F3F for ; Fri, 28 Nov 2003 07:43:19 -0800 (PST) (envelope-from bv@bilver.wjv.com) Received: from bilver.wjv.com (localhost.wjv.com [127.0.0.1]) by bilver.wjv.com (8.12.10/8.12.10) with ESMTP id hASFhAm7049742; Fri, 28 Nov 2003 10:43:10 -0500 (EST) (envelope-from bv@bilver.wjv.com) Received: (from bv@localhost) by bilver.wjv.com (8.12.10/8.12.10/Submit) id hASFh6jX049656; Fri, 28 Nov 2003 10:43:06 -0500 (EST) (envelope-from bv) Date: Fri, 28 Nov 2003 10:43:06 -0500 From: Bill Vermillion To: Vincent Goupil Message-ID: <20031128154306.GC47553@wjv.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: W.J.Vermillion / Orlando - Winter Park ReplyTo: bv@wjv.com User-Agent: Mutt/1.5.4i X-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.60 X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on bilver.wjv.com cc: freebsd-fs@freebsd.org Subject: Re: mfs is getting full (/etc/rc.diskless2) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: bv@wjv.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Nov 2003 15:43:22 -0000 Shakespeare wrote plays and sonnets which will last an eternity, but on Fri, Nov 28, 2003 at 10:16 , Vincent Goupil wrote: > It wasn't the logfile directly, I delete logfiles that has been > rotated (like .0.gz) But if you did that with the apache log files - if they were in var, you can't use syslog to rotate them. The files will stay open. You have to stop and restart apache, or use the 'rotatelogs' that comes with Apache. If you don't have Apache log files some other file - one that keeps a file open but does not close it each time it writes - can give you the same results. Bill > > > -----Original Message----- > From: Bill Vermillion [mailto:bv@wjv.com] > Sent: 27 novembre, 2003 21:33 > To: freebsd-fs@freebsd.org > Subject: Re: mfs is getting full (/etc/rc.diskless2) > > > Earlier in the linear time track, on approximately Thu, Nov 27, 2003 at > 18:26 , > Vincent Goupil divulged this public information: > > > > I've setup a firewall with a compact flash instead of a hard-drive. This > is > > the output of mount: > > /dev/ad0s2a on / (ufs, local, read-only) > > mfs:17 on /var (mfs, asynchronous, local) > > procfs on /proc (procfs, local) > > mfs:36 on /dev (mfs, asynchronous, local) > > > As you see, I mount the compact flash as read-only and I setup a memory > > filesystem for /var > > > In my rc.conf file: > > diskless_mount="/etc/rc.diskless2" > > varsize="131072" > > > Output of: df > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > /dev/ad0s2a 229942 197570 13978 93% / > > mfs:17 63471 56584 1810 97% /var > > procfs 4 4 0 100% /proc > > mfs:36 1503 66 1317 5% /dev > > > > Output of: df -h > > Filesystem Size Used Avail Capacity Mounted on > > /dev/ad0s2a 225M 193M 14M 93% / > > mfs:17 62M 55M 1.8M 97% /var > > procfs 4.0K 4.0K 0B 100% /proc > > mfs:36 1.5M 66K 1.3M 5% /dev > > > > Output of: du -h -d 1 /var > > 364K /var/db > > 1.0K /var/account > > 3.0K /var/at > > 1.0K /var/backups > > 1.0K /var/crash > > 2.0K /var/cron > > 1.0K /var/empty > > 5.0K /var/games > > 1.0K /var/heimdal > > 3.2M /var/log > > 31K /var/mail > > 2.0K /var/msgs > > 1.0K /var/preserve > > 47K /var/run > > 1.0K /var/rwho > > 16K /var/spool > > 3.0K /var/tmp > > 1.0K /var/yp > > 1.6M /var/mrtg > > 2.0K /var/ucd-snmp > > 5.3M /var > > > > Output of: du -d 1 /var > > 364 /var/db > > 1 /var/account > > 3 /var/at > > 1 /var/backups > > 1 /var/crash > > 2 /var/cron > > 1 /var/empty > > 5 /var/games > > 1 /var/heimdal > > 3299 /var/log > > 31 /var/mail > > 2 /var/msgs > > 1 /var/preserve > > 47 /var/run > > 1 /var/rwho > > 16 /var/spool > > 3 /var/tmp > > 1 /var/yp > > 1670 /var/mrtg > > 2 /var/ucd-snmp > > 5456 /var > > > It seems to have a big difference between > > the output of df and du (I know, I read > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html# > > DU-VS-DF ), but it's now explaining everything. I could be the > > way I setup it. > > > My problem is, my /var partition is getting filled very quickly > > and I don't know why ? I don't know what to clean. I've already > > deleted some log, but I saved only 2% of free space or 1000 > > block. > > > I don't know what is taking all this space ? Any ideas ? > > It sounds like you deleted some log file that the system keeps > open. So it will keep using up disk space even though the name is > gone. A file is not deleted until the last link is gone and if the > file is opened for loging by a program that never releases the file > that is your problem. > > At that point the easiest way is to reboot. Then find out what > you are logging and stop the things you don't need. When a log > gets full DO NOT remove it. Null it out. > > Just doing > should empty the log and reset > the pointer back to the first of the file and release all blocks in > use. > > Bill > -- > Bill Vermillion - bv @ wjv . com > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Bill Vermillion - bv @ wjv . com From owner-freebsd-fs@FreeBSD.ORG Fri Nov 28 13:35:15 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 643E216A4CE for ; Fri, 28 Nov 2003 13:35:15 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 36DE643FA3 for ; Fri, 28 Nov 2003 13:35:13 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id hASLYveF018257; Fri, 28 Nov 2003 13:35:01 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200311282135.hASLYveF018257@gw.catspoiler.org> Date: Fri, 28 Nov 2003 13:34:57 -0800 (PST) From: Don Lewis To: kmarx@vicor.com In-Reply-To: <200311281107.hASB7BeF017371@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-fs@FreeBSD.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Nov 2003 21:35:15 -0000 On 28 Nov, To: kmarx@vicor.com wrote: > On 24 Nov, Ken Marx wrote: >> >> >> Don Lewis wrote: > >>> Index: sys/kern/vfs_bio.c >>> =================================================================== >>> RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v >>> retrieving revision 1.242.2.21 >>> diff -u -r1.242.2.21 vfs_bio.c >>> --- sys/kern/vfs_bio.c 9 Aug 2003 16:21:19 -0000 1.242.2.21 >>> +++ sys/kern/vfs_bio.c 18 Nov 2003 02:10:55 -0000 >>> @@ -140,6 +140,7 @@ >>> &bufreusecnt, 0, ""); >>> >>> static int bufhashmask; >>> +static int bufhashshift; >>> static LIST_HEAD(bufhashhdr, buf) *bufhashtbl, invalhash; >>> struct bqueues bufqueues[BUFFER_QUEUES] = { { 0 } }; >>> char *buf_wmesg = BUF_WMESG; >>> @@ -160,7 +161,20 @@ >>> struct bufhashhdr * >>> bufhash(struct vnode *vnp, daddr_t bn) >>> { >>> - return(&bufhashtbl[(((uintptr_t)(vnp) >> 7) + (int)bn) & bufhashmask]); >>> + u_int64_t hashkey64; >>> + int hashkey; >>> + >>> + /* >>> + * Fibonacci hash, see Knuth's >>> + * _Art of Computer Programming, Volume 3 / Sorting and Searching_ >>> + * >>> + * We reduce the argument to 32 bits before doing the hash to >>> + * avoid the need for a slow 64x64 multiply on 32 bit platforms. >>> + */ >>> + hashkey64 = (u_int64_t)(uintptr_t)vnp + (u_int64_t)bn; >>> + hashkey = (((u_int32_t)(hashkey64 + (hashkey64 >> 32)) * 2654435769u) >> >>> + bufhashshift) & bufhashmask; >>> + return(&bufhashtbl[hashkey]); >>> } >>> >>> /* >>> @@ -319,8 +333,9 @@ >>> bufhashinit(caddr_t vaddr) >>> { >>> /* first, make a null hash table */ >>> + bufhashshift = 29; >>> for (bufhashmask = 8; bufhashmask < nbuf / 4; bufhashmask <<= 1) >>> - ; >>> + bufhashshift--; >>> bufhashtbl = (void *)vaddr; >>> vaddr = vaddr + sizeof(*bufhashtbl) * bufhashmask; >>> --bufhashmask; >>> >>> >> >> Well, I'm mildly beflummoxed - I tried to compare hashtable preformance >> between all three known versions of the hashing - legacy power of 2, >> the Vicor ^= hash, and Don's fibonacci hash. >> >> Running with >> >> minifree = max( 1, avgifree / 4 ); >> minbfree = max( 1, avgbfree ); >> >> all perform about the same, with no performance problems all >> the way up to 100% disk capacity (didn't test into reserved space). >> >> Looking at instrumentation to show freq and avg depth of the >> hash buckets, everything seems very calm (mainly because >> we're not hitting the linear searching very often, I'd presume). >> >> I can't explain why I seemlingly got performance problems >> with similar (identical) minbfree code previously. >> >> So, out of spite, I went back to >> >> minbfree = max( 1, avgbfree/4 ); >> >> This does hit the hashtable harder for the legacy version >> and not so much for either new flavor. Here are a few >> samplings of calling my dump routine from the debugger. >> "avgdepth" really means 'search depth' since we use >> the depth reached after finding a bp in gbincore. >> >> The line below such as, >> >> 0: avgdepth[1] cnt=801 >> >> means that 801 of the hashtable buckets had an avg search >> depth of 1 at the time the debug routine was called. >> The 'N:' prefix means the N-th unique non-zero such value. >> So large cnt's for small []'d depth values means an efficient hash. >> >> I've edited out the details as much as possible. >> >> LEGACY: >> -------- >> Nov 24 13:34:54 oos0b /kernel: bh[442/0x1ba]: freq=2706110, avgdepth = 154 >> ... >> Nov 24 13:34:54 oos0b /kernel: 0: avgdepth[1] cnt=1015 >> Nov 24 13:34:54 oos0b /kernel: 1: avgdepth[2] cnt=7 >> Nov 24 13:34:54 oos0b /kernel: 2: avgdepth[154] cnt=1 <- !! >> Nov 24 13:34:54 oos0b /kernel: 3: avgdepth[3] cnt=1 >> ----------- >> >> Nov 24 13:36:49 oos0b /kernel: bh[442/0x1ba]: freq=3416953, avgdepth = 141 >> ... >> Nov 24 13:36:49 oos0b /kernel: 0: avgdepth[1] cnt=1017 >> Nov 24 13:36:49 oos0b /kernel: 1: avgdepth[141] cnt=1 >> Nov 24 13:36:49 oos0b /kernel: 2: avgdepth[2] cnt=6 >> >> VICOR x-or hashtable: >> --------------------- >> Nov 24 13:07:24 oos0b /kernel: 0: avgdepth[1] cnt=762 >> Nov 24 13:07:24 oos0b /kernel: 1: avgdepth[2] cnt=259 >> Nov 24 13:07:24 oos0b /kernel: 2: avgdepth[3] cnt=3 >> ----------- >> >> Nov 24 13:08:07 oos0b /kernel: 0: avgdepth[1] cnt=744 >> Nov 24 13:08:07 oos0b /kernel: 1: avgdepth[2] cnt=275 >> Nov 24 13:08:07 oos0b /kernel: 2: avgdepth[3] cnt=5 >> >> FIBONACCI: >> ---------- >> Nov 24 11:56:50 oos0b /kernel: 0: avgdepth[1] cnt=811 >> Nov 24 11:56:50 oos0b /kernel: 1: avgdepth[3] cnt=88 >> Nov 24 11:56:50 oos0b /kernel: 2: avgdepth[2] cnt=124 >> Nov 24 11:56:50 oos0b /kernel: 3: avgdepth[0] cnt=1 >> ----------- >> >> Nov 24 11:57:48 oos0b /kernel: 0: avgdepth[1] cnt=801 >> Nov 24 11:57:48 oos0b /kernel: 1: avgdepth[3] cnt=93 >> Nov 24 11:57:48 oos0b /kernel: 2: avgdepth[2] cnt=130 >> >> So, while this is far from analytically eshaustive, >> it almost appears the fibonacci hash has more entries >> of depth 3, while the Vicor one has more at depth 2. >> >> I'm happy to run more tests if you have ideas. I'm also fine >> to cut bait and go with whatever you decide. It *seems* like >> putting the fibonacci hash is prudent since the current hash >> has been observed to be expensive. I had trouble proving this >> unequivocally though. So, perhaps Don's minbfree fix is sufficient >> after all. I'm tempted at this point to go with the 100% flavor. > > I think we're running into one of the weaknesses in the Fibonacci hash. > There are a large number of hash entries for the cylinder group blocks, > which are located at offsets which are multiples of 89 * 2^10 in your > example, or something on the order of 2^16. The effect of this is for > the cylinder group number to be hashed using the least significant bits > of the hash multiplier, which don't work as well for distributing the > hash values. I tried some of Knuth's suggestions, and got better > results with the hash multiplier 0x9E376DB1u. The most significant 16 > bits of the multplier are the same as the original constant, and the > least significant bits act as a fraction in the desirable range of 1/3 > to 3/7. Please give this new hash multiplier a try. I went ahead and spun a new version of my patch with the new multiplier, one other tweak to the formula, and updated comments. Index: sys/kern/vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.242.2.21 diff -u -r1.242.2.21 vfs_bio.c --- sys/kern/vfs_bio.c 9 Aug 2003 16:21:19 -0000 1.242.2.21 +++ sys/kern/vfs_bio.c 28 Nov 2003 20:02:06 -0000 @@ -140,6 +140,7 @@ &bufreusecnt, 0, ""); static int bufhashmask; +static int bufhashshift; static LIST_HEAD(bufhashhdr, buf) *bufhashtbl, invalhash; struct bqueues bufqueues[BUFFER_QUEUES] = { { 0 } }; char *buf_wmesg = BUF_WMESG; @@ -160,7 +161,40 @@ struct bufhashhdr * bufhash(struct vnode *vnp, daddr_t bn) { - return(&bufhashtbl[(((uintptr_t)(vnp) >> 7) + (int)bn) & bufhashmask]); + u_int64_t hashkey64; + int hashkey; + + /* + * A variation on the Fibonacci hash that Knuth credits to + * R. W. Floyd, see Knuth's _Art of Computer Programming, + * Volume 3 / Sorting and Searching_ + * + * We reduce the argument to 32 bits before doing the hash to + * avoid the need for a slow 64x64 multiply on 32 bit platforms. + * + * sizeof(struct vnode) is 168 on i386, so toss some of the lower + * bits of the vnode address to reduce the key range, which + * improves the distribution of keys across buckets. + * + * The file system cylinder group blocks are very heavily + * used. They are located at invervals of fbg, which is + * on the order of 89 to 94 * 2^10, depending on other + * filesystem parameters, for a 16k block size. Smaller block + * sizes will reduce fpg approximately proportionally. This + * will cause the cylinder group index to be hashed using the + * lower bits of the hash multiplier, which will not distribute + * the keys as uniformly in a classic Fibonacci hash where a + * relatively small number of the upper bits of the result + * are used. Using 2^16 as a close-enough approximation to + * fpg, split the hash multiplier in half, with the upper 16 + * bits being the inverse of the golden ratio, and the lower + * 16 bits being a fraction between 1/3 and 3/7 (closer to + * 3/7 in this case), that gives good experimental results. + */ + hashkey64 = ((u_int64_t)(uintptr_t)vnp >> 3) + (u_int64_t)bn; + hashkey = (((u_int32_t)(hashkey64 + (hashkey64 >> 32)) * 0x9E376DB1u) >> + bufhashshift) & bufhashmask; + return(&bufhashtbl[hashkey]); } /* @@ -319,8 +353,9 @@ bufhashinit(caddr_t vaddr) { /* first, make a null hash table */ + bufhashshift = 29; for (bufhashmask = 8; bufhashmask < nbuf / 4; bufhashmask <<= 1) - ; + bufhashshift--; bufhashtbl = (void *)vaddr; vaddr = vaddr + sizeof(*bufhashtbl) * bufhashmask; --bufhashmask; From owner-freebsd-fs@FreeBSD.ORG Sat Nov 29 13:29:50 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C534816A4CE; Sat, 29 Nov 2003 13:29:49 -0800 (PST) Received: from obsecurity.dyndns.org (adsl-63-207-60-234.dsl.lsan03.pacbell.net [63.207.60.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id C494F43F85; Sat, 29 Nov 2003 13:29:47 -0800 (PST) (envelope-from kris@obsecurity.org) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 5D84566D26; Sat, 29 Nov 2003 13:29:47 -0800 (PST) Date: Sat, 29 Nov 2003 13:29:46 -0800 From: Kris Kennaway To: Kris Kennaway Message-ID: <20031129212946.GA8894@xor.obsecurity.org> References: <20031124205800.GA20935@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W/nzBZO5zC0uMSeA" Content-Disposition: inline In-Reply-To: <20031124205800.GA20935@xor.obsecurity.org> User-Agent: Mutt/1.4.1i cc: re@FreeBSD.org cc: current@FreeBSD.org cc: fs@FreeBSD.org Subject: Re: recursed on non-recursive lock (sleep mutex) vnode interlock @ /var/portbuild/sparc64/src-client/sys/ufs/ufs/ufs_ihash.c:128 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Nov 2003 21:29:50 -0000 --W/nzBZO5zC0uMSeA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I got this on an alpha machine as well. Can someone track it down? msgbufp =3D 0xfffffc0023f85fe0 magic =3D 63062, size =3D 32736, r=3D 59046, w =3D 59565, ptr =3D 0xfffffc0= 023f7e000, cksum=3D 2511626 lock order reversal 1st 0xfffffc001a793d80 vnode interlock (vnode interlock) @ /a/asami/portbu= ild/alpha/src-client/sys/ufs/ufs/ufs_ihash.c:128 2nd 0xfffffc00006feda0 ufs ihash (ufs ihash) @ /a/asami/portbuild/alpha/sr= c-client/sys/ufs/ufs/ufs_ihash.c:124 Stack backtrace: recursed on non-recursive lock (sleep mutex) vnode interlock @ /a/asami/por= tbuild/alpha/src-client/sys/ufs/ufs/ufs_ihash.c:128 first acquired @ /a/asami/portbuild/alpha/src-client/sys/ufs/ufs/ufs_ihash.= c:128 Debugger() at Debugger+0x38 panic() at panic+0x168 witness_lock() at witness_lock+0x408 _mtx_lock_flags() at _mtx_lock_flags+0xc8 ufs_ihashget() at ufs_ihashget+0xec ffs_vget() at ffs_vget+0x54 ufs_lookup() at ufs_lookup+0xc9c ufs_vnoperate() at ufs_vnoperate+0x2c vfs_cache_lookup() at vfs_cache_lookup+0x37c ufs_vnoperate() at ufs_vnoperate+0x2c lookup() at lookup+0x4dc namei() at namei+0x310 stat() at stat+0x4c syscall() at syscall+0x39c XentSys() at XentSys+0x64 --- syscall (188, FreeBSD ELF64, stat) --- --- user mode --- db> Kris On Mon, Nov 24, 2003 at 12:58:01PM -0800, Kris Kennaway wrote: > One of my sparc64 package machines (running -current from Nov 21) died > overnight with the following: >=20 > recursed on non-recursive lock (sleep mutex) vnode interlock @ /var/portb= uild/sparc64/src-client/sys/ufs/ufs/ufs_ihash.c:128 > first acquired @ /var/portbuild/sparc64/src-client/sys/ufs/ufs/ufs_ihash.= c:128 > panic: recurse > cpuid =3D 0; > Debugger("panic") > Stopped at Debugger+0x1c: ta %xcc, 1 > db> trace > panic() at panic+0x174 > witness_lock() at witness_lock+0x3b4 > _mtx_lock_flags() at _mtx_lock_flags+0x9c > ufs_ihashget() at ufs_ihashget+0x94 > ffs_vget() at ffs_vget+0x20 > ufs_lookup() at ufs_lookup+0xb2c > ufs_vnoperate() at ufs_vnoperate+0x1c > vfs_cache_lookup() at vfs_cache_lookup+0x330 > ufs_vnoperate() at ufs_vnoperate+0x1c > lookup() at lookup+0x408 > namei() at namei+0x254 > vn_open_cred() at vn_open_cred+0x208 > vn_open() at vn_open+0x18 > kern_open() at kern_open+0x84 > open() at open+0x14 > syscall() at syscall+0x308 > -- syscall (5, FreeBSD ELF64, open) %o7=3D0x4038c2b0 -- > userland() at 0x40395948 > user trace: trap %o7=3D0x4038c2b0 > pc 0x40395948, sp 0x7fdffffdaf1 > pc 0x4038b47c, sp 0x7fdffffdc31 > pc 0x101778, sp 0x7fdffffdcf1 > pc 0x101378, sp 0x7fdffffddb1 > pc 0x100f80, sp 0x7fdffffde71 > pc 0x4020a234, sp 0x7fdffffdf31 > done --W/nzBZO5zC0uMSeA Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/yQ/KWry0BWjoQKURAppsAKCEE93XMKCRNO6qyOD046BVWKM8NACgyhDL CHFrv87wA0gG5JnXURXqZIQ= =mPQe -----END PGP SIGNATURE----- --W/nzBZO5zC0uMSeA--