Date: Thu, 7 Nov 1996 11:46:42 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: ponds!ponds!rivers (Thomas David Rivers) Cc: ponds!Artisoft.COM!ponds!freebsd.org!dyson, ponds!Artisoft.COM!ponds!freefall.cdrom.com!freebsd-hackers, ponds!Artisoft.COM!ponds!lakes.water.net!rivers, ponds!Artisoft.COM!ponds!lambert.org!terry Subject: Re: More info on the daily panics... Message-ID: <199611071846.LAA10405@phaeton.artisoft.com> In-Reply-To: <199611071247.HAA02530@lakes.water.net> from "Thomas David Rivers" at Nov 7, 96 07:47:13 am
next in thread | previous in thread | raw e-mail | index | archive | help
> Well - the jury is "in" - the patch didn't affect my problem. > > This morning, at 7:06am - I got a reboot: > > panic: ffs_valloc: dup alloc > > I seem to recall you mentioning this was an added solution to > some previous changes... could those be required to make this > a better fix? Recall, I'm using 2.1.5-STABLE as of Oct. 17th, > not 2.2. > > Any more avenues to explore? The patch only prevents a condition from occuring instead of panic'ing when it does occur. Ie: it prevents one less error condition that can't be handled from needing to be checked, and then "not handled" (panic). The previous changes didn't affect that specific area of the code, they were a usage workaround designed not to tickle the sensitive race. They were a kludge fix, since the code should be robust in isolation. The "doctor: don't do that" answer isn't applicable when the interfaces are (supposedly) treated as if they were black boxes. What FS's do you have mounted? This is very important. Not all FS's verify the generation count on the vnode like they should, and other nasty bits which may cause your problem as well. Very likely, if David is correct about the proximal cause, it is an FS-specific problem. That it errors out not in the call tree for the FS is a result of the panic check being in the wrong place (list reference instead of list insertion). Please see my recent posting. If David is right about your particular problem, then putting the check in the vrele() like I suggested there will cause a panic at the point of corruption rather than the point of use of corrupted data. This would have the effect of isolating the exact line of code causing your problem, which in the multiple vrele() case is probably an error path that is not frequently used. It would also make the panic condition repeat reliably, for what that's worth (probably a lot, at this point). The vnode->inode->vnode and the inode->vnode->inode integrity checks, like the "ffs_valloc: dup alloc" check, is a post-event check. You will see the corrupt data before you attempt to use it, and panic earlier than you would have, but the proximal cause of the corruption would still not be identifiable from the stack trace. 8-(. Unfortunately, the FreeBSD VFS architecture is not very friendly to FS debugging, and changes to fix it don't look like they will be going in en masse. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611071846.LAA10405>