From owner-freebsd-current@FreeBSD.ORG Sun Mar 14 06:52:07 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5EC5516A4CF; Sun, 14 Mar 2004 06:52:07 -0800 (PST) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id A469D43D31; Sun, 14 Mar 2004 06:52:04 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i2EEq3nX008867; Mon, 15 Mar 2004 01:52:03 +1100 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i2EEq0Gi018433; Mon, 15 Mar 2004 01:52:01 +1100 Date: Mon, 15 Mar 2004 01:52:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Don Lewis In-Reply-To: <200403140953.i2E9rZ7E037450@gw.catspoiler.org> Message-ID: <20040315012321.X1848@gamplex.bde.org> References: <200403140953.i2E9rZ7E037450@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@FreeBSD.org Subject: Re: recent changes in Giant usage -> panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Mar 2004 14:52:07 -0000 On Sun, 14 Mar 2004, Don Lewis wrote: > I cvsup'ed a few hours ago and did the buildworld and buildkernel drill. > I installed the new kernel, rebooted, and got this panic while mtree was > running in installworld. > > -------------------------------------------------------------- > >>> Making hierarchy > -------------------------------------------------------------- > cd /usr/src; /usr/obj/usr/src/make.i386/make -f Makefile.inc1 hierarchy > cd /usr/src/etc; /usr/obj/usr/src/make.i386/make distrib-dirs > mtree -deU -f /usr/src/etc/mtree/BSD.root.dist -p / > > panic: mutex Giant not owned at /usr/src/sys/kern/vfs_subr.c:899 > at line 719 in file /usr/src/sys/kern/kern_mutex.c > cpuid = 0; > Debugger("panic") > Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 > db> tr > Debugger(c07d2da1) at Debugger+0x46 > __panic(c07d217c,2cf,c07d22e8,c07d2423,c07daecf) at __panic+0x13d > _mtx_assert(c0893860,1,c07daecf,383,c68931b0) at _mtx_assert+0xc2 > vinvalbuf(c6893104,1,0,c6872690,0,0) at vinvalbuf+0x25 > vclean(c6893104,8,c6872690,c6893104,c6893104) at vclean+0x97 > vgonel(c6893104,c6872690,c6893104,0,c07daecf) at vgonel+0x4d > vgone(c6893104) at vgone+0x28 > pfs_exit(0,c6871898,c648c68c,0,c07d0695) at pfs_exit+0x3f > exit1(c6872690,0,e7b11d40,c076bca7,c6872690) at exit1+0x2bd > exit1(c6872690,e7b11d14,1,11,296) at exit1 > syscall(2f,2f,2f,4814a740,bfbfea48) at syscall+0x217 > Xint0x80_syscall() at Xint0x80_syscall+0x1d > --- syscall (1, FreeBSD ELF32, sys_exit), eip = 0x480c5163, esp = 0xbfbfe9ac, ebp = 0xbfbfe9c8 --- vgone() has interesting locking problems. It seems to be a fundamentally broken interface that only works if the kernel is not premptible or possibly if everything related to vgone() is locked by Giant. Apparently the lower levels know that Giant locking is needed. Look at this code in ufs_mknod9): % /* % * Remove inode, then reload it through VFS_VGET so it is % * checked to see if it is an alias of an existing entry in % * the inode cache. % */ % vput(*vpp); Here we've completely released the vnode, so references to it before we vget() it again use a garbage pointer, since we may be preempted and the vnode may be recycled. % (*vpp)->v_type = VNON; Here we reference it. % ino = ip->i_number; /* Save this before vgone() invalidates ip. */ Here we reference something hanging off it. % vgone(*vpp); Here we do considerably more with it, without holding any lock on it to begin with. % error = VFS_VGET(ap->a_dvp->v_mount, ino, LK_EXCLUSIVE, vpp); This and the rest is OK. % if (error) { % *vpp = NULL; % return (error); % } % return (0); % this code in I noticed this not due to losing a race, but because I have active code in vput() that recycles certain unwanted vnodes as soon as possible (actually sooner than possible due to the bug). The vnode gets vgone()'ed in vput() and comes back with type VBAD (so perhaps I'm mistaken that it was completely free). Then vgone()'ing it again does bad things (deadlock in deadfs). I work around this using: %%% Index: ufs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.238 diff -u -2 -r1.238 ufs_vnops.c --- ufs_vnops.c 11 Mar 2004 18:50:33 -0000 1.238 +++ ufs_vnops.c 12 Mar 2004 14:04:04 -0000 @@ -246,7 +258,9 @@ */ vput(*vpp); - (*vpp)->v_type = VNON; ino = ip->i_number; /* Save this before vgone() invalidates ip. */ - vgone(*vpp); + if ((*vpp)->v_type != VBAD) { + (*vpp)->v_type = VNON; + vgone(*vpp); + } error = VFS_VGET(ap->a_dvp->v_mount, ino, LK_EXCLUSIVE, vpp); if (error) { %%% Perhaps the correct fix is to remove vgone() and always use vgonel(). And Giant locking. Bruce