From owner-freebsd-hackers Wed Nov 6 11:21:19 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id LAA12608 for hackers-outgoing; Wed, 6 Nov 1996 11:21:19 -0800 (PST) Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id LAA12528; Wed, 6 Nov 1996 11:21:11 -0800 (PST) Received: from current1.whistle.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.8.2/8.8.2) with SMTP id LAA09162; Wed, 6 Nov 1996 11:15:25 -0800 (PST) Message-ID: <3280E3BC.15FB7483@whistle.com> Date: Wed, 06 Nov 1996 11:15:08 -0800 From: Julian Elischer Organization: Whistle Communications X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.2-CURRENT i386) MIME-Version: 1.0 To: Terry Lambert CC: archie@whistle.com, freebsd-hackers@freebsd.org, davidg@freebsd.org Subject: Re: Davidg bug (was: mount panics & hangs) References: <199611061733.KAA08415@phaeton.artisoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Terry Lambert wrote: > > > dounmount(mp, flags, p) > > register struct mount *mp; > > int flags; > > struct proc *p; > > { > > struct vnode *coveredvp; > > int error; > > > > coveredvp = mp->mnt_vnodecovered; > > if (vfs_busy(mp)) > > return (EBUSY); > ^^^^^^^^^^^^^^^^^^^^^^^ > > mp->mnt_flag |= MNT_UNMOUNT; > > error = vfs_lock(mp); > > if (error) > > return (error); <-------line "C" > ^^^^^^^^^^^^^^^^^^^^^^^ > > > BTW there is another small bug, which is.. the return at line "C" > > should also do a vfs_unbusy() > > > > suggestions? > > Add a "NOWAIT" flags value, obey it at the indicated locations, and > don't pass it in this case (only on shutdown). > > In reality, there should be a mutex for the VFS structures, the list > of mounted fs's being one of them, where "dounmount" is called, so you > never have more than one process in the mount code. > > The problem is that the vfs_busy/vfs_lock pair create a race condition > because there is not an imposed order of operation. That comes from > mixing the vfsop and vop layers without regard to structural call > layering (vfsop is hierarchically above vop). > > So you're right: it's a "thundering herd" problem, where the wrong > process happens to win the race. > > I suspect that this requires the while loop to happen so that the > priority becomes inverted when both processes are marked ready-to-run. > > Has this problem *ever* been repeated without that while loop to > torture it into happening? yes it happens regularly to us which is why we wrote the test case to isolate it. David, do you have any suggested fix? If you want I can try work a patch and send it to you for comments. (though I don't yet understand why there needs to be both vfs_busy and vfs_lock)