From owner-freebsd-hackers Wed Nov 6 05:50:40 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id FAA19717 for hackers-outgoing; Wed, 6 Nov 1996 05:50:40 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id FAA19710 for ; Wed, 6 Nov 1996 05:50:36 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA16999; Wed, 6 Nov 1996 08:50:03 -0500 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Wed, 6 Nov 1996 08:50 EST Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.7.5/8.7.3) with ESMTP id HAA20268; Wed, 6 Nov 1996 07:59:34 -0500 (EST) Received: (from rivers@localhost) by lakes.water.net (8.7.5/8.6.9) id IAA02838; Wed, 6 Nov 1996 08:00:35 -0500 (EST) Date: Wed, 6 Nov 1996 08:00:35 -0500 (EST) From: Thomas David Rivers Message-Id: <199611061300.IAA02838@lakes.water.net> To: terry@lambert.org, ponds!ponds!rivers Subject: Re: More info on the daily panics... Cc: ponds!freebsd.org!dyson, ponds!freebsd.org!freebsd-hackers, ponds!lambert.org!terry Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > > > Time for a repost, it seems... > > > > > ... description delete ... > > > > Err, umm, ... doesn't that only fix the "free vnode isn't" > > panic? That's really not what I'm seeing... I'm seeing > > inode allocation panics coming from ffs_valloc.c. > > No. It fixes the freelist wrap error. Ahh... I see... I was going to elide your description, but I'll leave it in case others missed it... Although this isn't the complete patch you discuss below, I believe it to be the proper fix for 2.1.5-STABLE.... I'm providing to a) Let others acquire it, b) Get this into 2.1.6, c) Ensure I understand the intent of your change. This patch doesn't use the nice macro, but it changes getnewvnode() to allocate a totally new node when we've wrapped around the end of the list... After Terry signs off on this, can someone get it committed to 2.1.6 (nee 2.1.5-STABLE)... - Thanks - - Dave Rivers - *** vfs_subr.c.ori Thu Aug 15 16:36:16 1996 --- vfs_subr.c Wed Nov 6 07:57:02 1996 *************** *** 339,345 **** */ if (freevnodes < (numvnodes >> 2) || numvnodes < desiredvnodes || ! vp == NULL) { vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); bzero((char *) vp, sizeof *vp); --- 339,347 ---- */ if (freevnodes < (numvnodes >> 2) || numvnodes < desiredvnodes || ! vp == NULL || /* list empty */ ! vp->v_usecount) /* queue wrapped */ ! { vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); bzero((char *) vp, sizeof *vp); *************** *** 347,355 **** } else { TAILQ_REMOVE(&vnode_free_list, vp, v_freelist); freevnodes--; - - if (vp->v_usecount) - panic("free vnode isn't"); /* see comment on why 0xdeadb is set at end of vgone (below) */ vp->v_freelist.tqe_prev = (struct vnode **) 0xdeadb; --- 349,354 ---- > > > I'm seeing panics in ffs_vfree() that seem to be that the > > inode is clear when it shouldn't be. > > vclean() will cause it to be marked clear if: > > 1) The overflow occurs by exactly *1*; this will cause the > new inode to overwrite the old. > 2) The vnode that is reallocated is freed by the original > owner; this will cause the [new] inode to be cleared. > 3) You attempt a second operation on the scond vnode > reference to the same object. > > If the overflow occurs by 2 or more (alloc a/alloc b/alloc c/free b/free a) > then you will get "free vnode isn't"... you will see this during a > directory lookup operation for a create, especially in msdosfs. > > It tends to be hacked around on a per FS basis because the VOP_LOCK > code is duplicated instead of shared, and everyone implements it > slightly differently because the lock data area is off the inode > instead of the vnode. > > Like I said in the patch, understand what the patch is working around > and you will understand why it's needed. The patch to the create > op in vfs_vnops.c that was made a bit ago only hacks around the problem. > > > I'm thinking the problem is either the inode allocation bit > > (cg_inosused) is being cleared when it shouldn't be, or it > > isn't being set when it shouldn't be... That would readily > > explain the panic's I'm seeing... > > Consider instead what would happen if ffs reused a vnode when it > was not truly free... it would rewrite the inode data pointer. But > the original inode data pointer would point to the same object, but > the inode that the vnode that the original inode pointed to could > be free. So a reference by inode "works" (but gives the wrong > vnode and buffers), but a reference by vnode fails. > > This also explains the occasional write warnings, and the occasional > library corruption, FWIW. Cleaning an inode from the hash with an > associated vnode would cause the data buffers from the second file > to be applied to the first. I believe this is the source of a number > of MSDOSFS "bugs"; MSDOSFS is more sensitive because an inode *is* > a directory entry. The buffer swapping means that even a read-only > mounted FS can get writes on overflow. > > You can verify this by ASSERT'ing that the inode reference in the > vnode the inode points to points to the inode in question (this > may fail on null-layer stacking, however, since the vnode data > pointer points to another vnode). > > Part of "the true fix" must be to zone devices and pass writes through > zone filtering... this is partially done anyway, but there are currently > four or five logical device interfaces, and not all of them do it... > only the disklabel devices really get it. Basically, there is a need > for a common "logical device descriptor" which applies to all partitioning > mechanisms. > > > > Now, could the vfs_subr.c changes you suggest cause this > > to happen? If the ffs_xxxx routines and data are properly > > isolated - seems like that wouldn't be the case... But, > > I'm not file-system guru. > > Yes, it can cause the error. Try the ASSERT. > > > I've been loathe to discuss this in detail without patches in hand, > since it's a serious reliability problem... unfortunately, patches > require an orthogonal infrastructure (layering fixes to make everyone > lock the same way so it can be worked around, then logical to physical > device mapping by descriptor through a mapping layer for a real fix > to simply disallow out-of-zone writes, then per fs allocation of the > vnode as a subelement of the in core inode and a VOP_VRELE, etc.). > > > Regards, > Terry Lambert > terry@lambert.org