From owner-freebsd-hackers  Tue Nov  5 10:09:02 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id KAA13984
          for hackers-outgoing; Tue, 5 Nov 1996 10:09:02 -0800 (PST)
Received: from who.cdrom.com (who.cdrom.com [204.216.27.3])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id KAA13977;
          Tue, 5 Nov 1996 10:08:59 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by who.cdrom.com (8.7.5/8.6.11) with SMTP id KAA11771
          ; Tue, 5 Nov 1996 10:08:58 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA06616; Tue, 5 Nov 1996 11:00:11 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199611051800.LAA06616@phaeton.artisoft.com>
Subject: Re: More info on the daily panics...
To: ponds!rivers@dg-rtp.dg.com (Thomas David Rivers)
Date: Tue, 5 Nov 1996 11:00:11 -0700 (MST)
Cc: terry@lambert.org, dyson@freebsd.org, freebsd-hackers@freebsd.org
In-Reply-To: <199611050428.XAA00313@lakes.water.net> from "Thomas David Rivers" at Nov 4, 96 11:28:03 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> > Time for a repost, it seems...
> > 
>  ... description delete ...
>  
>  Err, umm, ... doesn't that only fix the "free vnode isn't"
> panic?  That's really not what I'm seeing... I'm seeing
> inode allocation panics coming from ffs_valloc.c.

No.  It fixes the freelist wrap error.

>  I'm seeing panics in ffs_vfree() that seem to be that the
> inode is clear when it shouldn't be.

vclean() will cause it to be marked clear if:

1)	The overflow occurs by exactly *1*; this will cause the
	new inode to overwrite the old.
2)	The vnode that is reallocated is freed by the original
	owner; this will cause the [new] inode to be cleared.
3)	You attempt a second operation on the scond vnode
	reference to the same object.

If the overflow occurs by 2 or more (alloc a/alloc b/alloc c/free b/free a)
then you will get "free vnode isn't"... you will see this during a
directory lookup operation for a create, especially in msdosfs.

It tends to be hacked around on a per FS basis because the VOP_LOCK
code is duplicated instead of shared, and everyone implements it
slightly differently because the lock data area is off the inode
instead of the vnode.

Like I said in the patch, understand what the patch is working around
and you will understand why it's needed.  The patch to the create
op in vfs_vnops.c that was made a bit ago only hacks around the problem.

>  I'm thinking the problem is either the inode allocation bit
> (cg_inosused) is being cleared when it shouldn't be, or it
> isn't being set when it shouldn't be...  That would readily
> explain the panic's I'm seeing...

Consider instead what would happen if ffs reused a vnode when it
was not truly free... it would rewrite the inode data pointer.  But
the original inode data pointer would point to the same object, but
the inode that the vnode that the original inode pointed to could
be free.  So a reference by inode "works" (but gives the wrong
vnode and buffers), but a reference by vnode fails.

This also explains the occasional write warnings, and the occasional
library corruption, FWIW.  Cleaning an inode from the hash with an
associated vnode would cause the data buffers from the second file
to be applied to the first.  I believe this is the source of a number
of MSDOSFS "bugs"; MSDOSFS is more sensitive because an inode *is*
a directory entry.  The buffer swapping means that even a read-only
mounted FS can get writes on overflow.

You can verify this by ASSERT'ing that the inode reference in the
vnode the inode points to points to the inode in question (this
may fail on null-layer stacking, however, since the vnode data
pointer points to another vnode).

Part of "the true fix" must be to zone devices and pass writes through
zone filtering... this is partially done anyway, but there are currently
four or five logical device interfaces, and not all of them do it...
only the disklabel devices really get it.  Basically, there is a need
for a common "logical device descriptor" which applies to all partitioning
mechanisms.


>  Now, could the vfs_subr.c changes you suggest cause this
> to happen?  If the ffs_xxxx routines and data are properly 
> isolated - seems like that wouldn't be the case...  But,
> I'm not file-system guru.

Yes, it can cause the error.  Try the ASSERT.


I've been loathe to discuss this in detail without patches in hand,
since it's a serious reliability problem... unfortunately, patches
require an orthogonal infrastructure (layering fixes to make everyone
lock the same way so it can be worked around, then logical to physical
device mapping by descriptor through a mapping layer for a real fix
to simply disallow out-of-zone writes, then per fs allocation of the
vnode as a subelement of the in core inode and a VOP_VRELE, etc.).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.