From owner-freebsd-current Wed Mar 6 12:03:06 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id MAA19123 for current-outgoing; Wed, 6 Mar 1996 12:03:06 -0800 (PST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id MAA19118 for ; Wed, 6 Mar 1996 12:03:05 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id MAA11679; Wed, 6 Mar 1996 12:57:28 -0700 From: Terry Lambert Message-Id: <199603061957.MAA11679@phaeton.artisoft.com> Subject: Re: fixes for rename panic (round 1) To: bde@zeta.org.au (Bruce Evans) Date: Wed, 6 Mar 1996 12:57:28 -0700 (MST) Cc: jhay@mikom.csir.co.za, freebsd-current@FreeBSD.ORG In-Reply-To: <199603060750.SAA09666@godzilla.zeta.org.au> from "Bruce Evans" at Mar 6, 96 06:50:15 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-current@FreeBSD.ORG Precedence: bulk > The main cause of the panic is broken reference counting in certain > error cases. relookup() calls vput(fdvp) when it fails, so callers > must increment the reference count before calling relookup() and > decrement it if relookup() doesn't fail. Not doing this caused > v_usecount for the test directory to eventually become negative. > > relookup() fails when the `from' file went away. Different bad things > happen if it went away and came back. Then relookup doesn't fail, but > the wrong file or directory is removed. If a regular file went away > and came back as a directory, then the file system is corrupted. [ ... patches ... ] Have you tested these changes with MDOSFS (especially) and EXT2FS (less especially)? I suspect a potential deadlock on identical path prefixes one path component off, and on "." and ".." references for some particular cases. The existance of the race on the rename is intentional based on the need to hold the directory exclusively when inserting an entry. This is, I think, an architectural side-effect of making rename an FS atomic operation. Not that I don't agree that it shouldn't cause a panic. 8-) 8-). My personal "dream soloution" to this problem is to issue reader/writer locks on the vnode and remove the ladder-chain release race on a path traversal entirely. The magic is that "R" and "IX" are not conflicting for the same thread (context/process ID/whatever); it's the promotion from "IX" to "X" that generates the conflict. This causes the vnode reference to exist, but doesn't screw the traversal because of the VLOCK recursion restriction. Alternately, in implementing the UFS under Win95's IFS framework, we set the IN_RECURSE bit, which has similar effect to the patches you have, without adding unduly to the complexity. This is a kludge, plain and simple, because the IN_RECURSE itself is a kludge (it's predicated on the possibility of a fault on the copyout in the uiomove with a swap file on the same FS as is performing the currently requested operation). Really, this goes to my suggestion before that the VOP_LOCK code go to common higher level code and become purely advisory in all FS's except UNIONFS (needs to hold underlying vnodes locked) and QUOTAFS (needs to hold the quota file vnode locked on the underlying FS to which quotas are being applied). That, in combination with the conversion of the lock to a counting semaphore (ala SVR4, SunOS, Solaris, SCO, Unix SVR4 ES/MP, and AIX) will resolve the recusion case cleanly as well, without involving the underlying FS's inodes. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.