From owner-freebsd-fs@FreeBSD.ORG Tue Nov 25 13:24:21 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 02F0516A4CE for ; Tue, 25 Nov 2003 13:24:21 -0800 (PST) Received: from filer.fsl.cs.sunysb.edu (filer.fsl.cs.sunysb.edu [130.245.126.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E44243FE9 for ; Tue, 25 Nov 2003 13:24:16 -0800 (PST) (envelope-from ezk@fsl.cs.sunysb.edu) Received: from agora.fsl.cs.sunysb.edu (IDENT:uZmDaDLDMapOqQ++SOtAtXyBgQpmMENs@agora.fsl.cs.sunysb.edu [130.245.126.12])hAPLMIHn029059; Tue, 25 Nov 2003 16:22:18 -0500 Received: from agora.fsl.cs.sunysb.edu (IDENT:MbDxfAsrbfpIJzB2czikWlNTDFcDQ1hw@localhost.localdomain [127.0.0.1]) hAPLMRg9018538; Tue, 25 Nov 2003 16:22:27 -0500 Received: (from ezk@localhost) by agora.fsl.cs.sunysb.edu (8.12.8/8.12.8/Submit) id hAPLMRfE018534; Tue, 25 Nov 2003 16:22:27 -0500 Date: Tue, 25 Nov 2003 16:22:27 -0500 Message-Id: <200311252122.hAPLMRfE018534@agora.fsl.cs.sunysb.edu> From: Erez Zadok To: Ian Dowse In-reply-to: Your message of "Tue, 25 Nov 2003 21:07:29 GMT." <200311252107.aa96370@salmon.maths.tcd.ie> X-MailKey: Erez_Zadok cc: amd-dev@cs.columbia.edu cc: Erez Zadok cc: fs@freebsd.org Subject: Re: vnode refcnt bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Nov 2003 21:24:21 -0000 Ian, I'm CC-ing my reply to the am-utils developers mailing list, amd-dev. Let's keep this thread on both fs@ and amd-dev for a bit. Can the people on amd-dev who noticed this problem please answer Ian's questions? In message <200311252107.aa96370@salmon.maths.tcd.ie>, Ian Dowse writes: > In message <200311252003.hAPK3Bb9017036@agora.fsl.cs.sunysb.edu>, Erez Zadok wr > ites: > >Please see this short thread of discussion on amd-dev. I've included two > >messages from this thread. It suggests that fbsd5 may have a vnode refcount > >bug (a vnode isn't held where it should). > > > >I've not personally investigated this bug. Does anyone on fs@ has come > >across such a possible bug? > > Hmm, I guess it is caused by checkdirs() in vfs_mount.c moving the > process cwd to the underlying vnode before attempting the unmount. > Does this only happen if the cwd is at the mount point itself? > > When a file system is first mounted, checkdirs() looks for processes > that had a cwd or chroot set to the vnode that is about to be > covered. It moves these processes to the new mountpoint vnode. > This behaviour goes back a long time (I'm not sure what the reasons > were), but it had the problem that you would get a "Device busy" > error if you attempted to unmount the file system later, and a > forced unmount would leave the process with a stale cwd or chroot > vnode (i.e. "mount /mnt; umount /mnt" would fail if any processes > previously had a cwd of /mnt, and "mount /mnt; umount -f /mnt" would > cause such processes to lose their reference to the /mnt directory). > > More recently (Feb 2001), I changed unmount to undo the checkdirs() > step so that processes with a cwd or chroot at the mount point get > moved back to the covered vnode before the unmount is attempted. > This fixes the two issues, but it has the side-effect that if the > only vnode references to a file system are processes whose cwd or > chroot directory is on the mountpoint, then the unmount will succeed, > and those processes will be moved to the underlying directory. Hmmm, yes I think that could be a serious problem (esp. since fbsd doesn't have autofs yet). And I think it deviates from "norms" where a cwd is essentially occupying a vnode within the mounted f/s and therefore the f/s shouldn't be unmounted! This is rather bad for users who sit on an nfs mnt point, ls'ing files happily, and then the kernel unmounts the mnt pt, moves their cwd down to the covered (typically empty) vnode, and the poor user's next /bin/ls shows nothing. Personally, having dealt w/ stackable f/s for a while, I found that when the kernel tries to do all sorts from "under the feet" of the application (or any other upper-layer kernel component), it opens up avenues for trouble. Yes, maybe an un/mount() flag will solve this issue. But I'd like to see the more normal EBUSY-on-cwd behavior restored, and an un/mount flag for those who really want the new behavior. I'm a big proponent of backwards compatibility, and new features gradually introduced through flags/options. And if I want to force an unmount of an mnt pt and I get EBUSY, I do lsof and then /bin/kill any process sitting on the mnt pt; that's expected behavior (what does POSIX say?) > The reference count checks could be moved to before checkdirs(), > but I think there are cases where the current behaviour is preferable, > so maybe it needs to be an unmount() flag... BTW, does amd delete > the mountpoint directory after the unmount? That would explain why > the directory goes away entirely. If Amd created the mount point when it started (say, the mnt pt didn't exist), then Amd will also try to rmdir it upon unmount. > Ian Cheers, Erez.