From owner-freebsd-hackers  Wed Nov  6 05:50:40 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id FAA19717
          for hackers-outgoing; Wed, 6 Nov 1996 05:50:40 -0800 (PST)
Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id FAA19710
          for <freebsd-hackers@freebsd.org>; Wed, 6 Nov 1996 05:50:36 -0800 (PST)
Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02)
	id AA16999; Wed, 6 Nov 1996 08:50:03 -0500
Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Wed,  6 Nov 1996 08:50 EST
Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.7.5/8.7.3) with ESMTP id HAA20268; Wed, 6 Nov 1996 07:59:34 -0500 (EST)
Received: (from rivers@localhost) by lakes.water.net (8.7.5/8.6.9) id IAA02838; Wed, 6 Nov 1996 08:00:35 -0500 (EST)
Date: Wed, 6 Nov 1996 08:00:35 -0500 (EST)
From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
Message-Id: <199611061300.IAA02838@lakes.water.net>
To: terry@lambert.org, ponds!ponds!rivers
Subject: Re: More info on the daily panics...
Cc: ponds!freebsd.org!dyson, ponds!freebsd.org!freebsd-hackers,
        ponds!lambert.org!terry
Content-Type: text
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> 
> > > Time for a repost, it seems...
> > > 
> >  ... description delete ...
> >  
> >  Err, umm, ... doesn't that only fix the "free vnode isn't"
> > panic?  That's really not what I'm seeing... I'm seeing
> > inode allocation panics coming from ffs_valloc.c.
> 
> No.  It fixes the freelist wrap error.

  Ahh... I see...  I was going to elide your description,
but I'll leave it in case others missed it...

  Although this isn't the complete patch you discuss below,
I believe it to be the proper fix for 2.1.5-STABLE....  I'm
providing to a) Let others acquire it, b) Get this into 2.1.6,
c) Ensure I understand the intent of your change.

 This patch doesn't use the nice macro, but it changes getnewvnode()
to allocate a totally new node when we've wrapped around the end of
the list...

 After Terry signs off on this, can someone get it committed to
2.1.6 (nee 2.1.5-STABLE)...

	  - Thanks -
	- Dave Rivers -


*** vfs_subr.c.ori	Thu Aug 15 16:36:16 1996
--- vfs_subr.c	Wed Nov  6 07:57:02 1996
***************
*** 339,345 ****
  	 */
  	if (freevnodes < (numvnodes >> 2) ||
  	    numvnodes < desiredvnodes ||
! 	    vp == NULL) {
  		vp = (struct vnode *) malloc((u_long) sizeof *vp,
  		    M_VNODE, M_WAITOK);
  		bzero((char *) vp, sizeof *vp);
--- 339,347 ----
  	 */
  	if (freevnodes < (numvnodes >> 2) ||
  	    numvnodes < desiredvnodes ||
! 	    vp == NULL ||      /* list empty */
! 	    vp->v_usecount)    /* queue wrapped */
!         {
  		vp = (struct vnode *) malloc((u_long) sizeof *vp,
  		    M_VNODE, M_WAITOK);
  		bzero((char *) vp, sizeof *vp);
***************
*** 347,355 ****
  	} else {
  		TAILQ_REMOVE(&vnode_free_list, vp, v_freelist);
  		freevnodes--;
- 
- 		if (vp->v_usecount)
- 			panic("free vnode isn't");
  
  		/* see comment on why 0xdeadb is set at end of vgone (below) */
  		vp->v_freelist.tqe_prev = (struct vnode **) 0xdeadb;
--- 349,354 ----

> 
> >  I'm seeing panics in ffs_vfree() that seem to be that the
> > inode is clear when it shouldn't be.
> 
> vclean() will cause it to be marked clear if:
> 
> 1)	The overflow occurs by exactly *1*; this will cause the
> 	new inode to overwrite the old.
> 2)	The vnode that is reallocated is freed by the original
> 	owner; this will cause the [new] inode to be cleared.
> 3)	You attempt a second operation on the scond vnode
> 	reference to the same object.
> 
> If the overflow occurs by 2 or more (alloc a/alloc b/alloc c/free b/free a)
> then you will get "free vnode isn't"... you will see this during a
> directory lookup operation for a create, especially in msdosfs.
> 
> It tends to be hacked around on a per FS basis because the VOP_LOCK
> code is duplicated instead of shared, and everyone implements it
> slightly differently because the lock data area is off the inode
> instead of the vnode.
> 
> Like I said in the patch, understand what the patch is working around
> and you will understand why it's needed.  The patch to the create
> op in vfs_vnops.c that was made a bit ago only hacks around the problem.
> 
> >  I'm thinking the problem is either the inode allocation bit
> > (cg_inosused) is being cleared when it shouldn't be, or it
> > isn't being set when it shouldn't be...  That would readily
> > explain the panic's I'm seeing...
> 
> Consider instead what would happen if ffs reused a vnode when it
> was not truly free... it would rewrite the inode data pointer.  But
> the original inode data pointer would point to the same object, but
> the inode that the vnode that the original inode pointed to could
> be free.  So a reference by inode "works" (but gives the wrong
> vnode and buffers), but a reference by vnode fails.
> 
> This also explains the occasional write warnings, and the occasional
> library corruption, FWIW.  Cleaning an inode from the hash with an
> associated vnode would cause the data buffers from the second file
> to be applied to the first.  I believe this is the source of a number
> of MSDOSFS "bugs"; MSDOSFS is more sensitive because an inode *is*
> a directory entry.  The buffer swapping means that even a read-only
> mounted FS can get writes on overflow.
> 
> You can verify this by ASSERT'ing that the inode reference in the
> vnode the inode points to points to the inode in question (this
> may fail on null-layer stacking, however, since the vnode data
> pointer points to another vnode).
> 
> Part of "the true fix" must be to zone devices and pass writes through
> zone filtering... this is partially done anyway, but there are currently
> four or five logical device interfaces, and not all of them do it...
> only the disklabel devices really get it.  Basically, there is a need
> for a common "logical device descriptor" which applies to all partitioning
> mechanisms.
> 
> 
> >  Now, could the vfs_subr.c changes you suggest cause this
> > to happen?  If the ffs_xxxx routines and data are properly 
> > isolated - seems like that wouldn't be the case...  But,
> > I'm not file-system guru.
> 
> Yes, it can cause the error.  Try the ASSERT.
> 
> 
> I've been loathe to discuss this in detail without patches in hand,
> since it's a serious reliability problem... unfortunately, patches
> require an orthogonal infrastructure (layering fixes to make everyone
> lock the same way so it can be worked around, then logical to physical
> device mapping by descriptor through a mapping layer for a real fix
> to simply disallow out-of-zone writes, then per fs allocation of the
> vnode as a subelement of the in core inode and a VOP_VRELE, etc.).
> 
> 
> 					Regards,
> 					Terry Lambert
> 					terry@lambert.org