Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Nov 1996 11:46:42 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        ponds!ponds!rivers (Thomas David Rivers)
Cc:        ponds!Artisoft.COM!ponds!freebsd.org!dyson, ponds!Artisoft.COM!ponds!freefall.cdrom.com!freebsd-hackers, ponds!Artisoft.COM!ponds!lakes.water.net!rivers, ponds!Artisoft.COM!ponds!lambert.org!terry
Subject:   Re: More info on the daily panics...
Message-ID:  <199611071846.LAA10405@phaeton.artisoft.com>
In-Reply-To: <199611071247.HAA02530@lakes.water.net> from "Thomas David Rivers" at Nov 7, 96 07:47:13 am

next in thread | previous in thread | raw e-mail | index | archive | help
>  Well - the jury is "in" - the patch didn't affect my problem.
> 
>  This morning, at 7:06am - I got a reboot:
> 
> 	panic: ffs_valloc: dup alloc
> 
>  I seem to recall you mentioning this was an added solution to
> some previous changes... could those be required to make this
> a better fix?  Recall, I'm using 2.1.5-STABLE as of Oct. 17th,
> not 2.2.
> 
>  Any more avenues to explore?

The patch only prevents a condition from occuring instead of panic'ing
when it does occur.  Ie: it prevents one less error condition that can't
be handled from needing to be checked, and then "not handled" (panic).

The previous changes didn't affect that specific area of the code, they
were a usage workaround designed not to tickle the sensitive race.  They
were a kludge fix, since the code should be robust in isolation.  The
"doctor: don't do that" answer isn't applicable when the interfaces are
(supposedly) treated as if they were black boxes.


What FS's do you have mounted?  This is very important.  Not all FS's
verify the generation count on the vnode like they should, and other
nasty bits which may cause your problem as well.  Very likely, if David
is correct about the proximal cause, it is an FS-specific problem.  That
it errors out not in the call tree for the FS is a result of the panic
check being in the wrong place (list reference instead of list insertion).


Please see my recent posting.  If David is right about your particular
problem, then putting the check in the vrele() like I suggested there
will cause a panic at the point of corruption rather than the point
of use of corrupted data.

This would have the effect of isolating the exact line of code causing
your problem, which in the multiple vrele() case is probably an error
path that is not frequently used.

It would also make the panic condition repeat reliably, for what that's
worth (probably a lot, at this point).


The vnode->inode->vnode and the inode->vnode->inode integrity checks,
like the "ffs_valloc: dup alloc" check, is a post-event check.  You
will see the corrupt data before you attempt to use it, and panic
earlier than you would have, but the proximal cause of the corruption
would still not be identifiable from the stack trace.  8-(.

Unfortunately, the FreeBSD VFS architecture is not very friendly to FS
debugging, and changes to fix it don't look like they will be going in
en masse.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611071846.LAA10405>