From owner-freebsd-hackers Thu Nov 7 18:20:51 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id SAA29075 for hackers-outgoing; Thu, 7 Nov 1996 18:20:51 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id SAA29058 for ; Thu, 7 Nov 1996 18:20:38 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA20486; Thu, 7 Nov 1996 21:20:03 -0500 Received: from lambert.org by dg-rtp.dg.com.rtp.dg.com; Thu, 7 Nov 1996 21:20 EST Received: from dg-rtp.UUCP (uucp@localhost) by ponds.water.net (8.7.5/8.7.3) with UUCP id SAA04319 for freefall.cdrom.com!freebsd-hackers; Thu, 7 Nov 1996 18:17:16 -0500 (EST) Received: from reggae.ncren.net by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA20565; Thu, 7 Nov 1996 14:37:46 -0500 Received: from mcnc.UUCP by reggae.ncren.net (5.65/tas-reggae/may94) id AA03645; Thu, 7 Nov 96 14:34:20 -0500 Received: from ncnoc.ncren.net by reggae.ncren.net (5.65/tas-reggae/may94) id AA03236; Thu, 7 Nov 96 13:58:06 -0500 Received: from stingray.mcnc.org (stingray.mcnc.org [128.109.130.74]) by ncnoc.ncren.net (8.7.4/8.7.3) with ESMTP id NAA25380; Thu, 7 Nov 1996 13:57:34 -0500 (EST) Received: from relay3.UU.NET by stingray.mcnc.org (8.7.5/MCNC/8-10-92) id NAA13603; Thu, 7 Nov 1996 13:56:05 -0500 (EST) for Received: from coyote.Artisoft.COM by relay3.UU.NET with ESMTP (peer crosschecked as: coyote.Artisoft.COM [198.17.250.162]) id QQbovj17719; Thu, 7 Nov 1996 13:56:53 -0500 (EST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by coyote.Artisoft.COM (8.7.6/8.7.3) with SMTP id LAA02788; Thu, 7 Nov 1996 11:54:26 -0700 (MST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA10405; Thu, 7 Nov 1996 11:46:43 -0700 From: Terry Lambert Message-Id: <199611071846.LAA10405@phaeton.artisoft.com> Subject: Re: More info on the daily panics... To: ponds!ponds!rivers (Thomas David Rivers) Date: Thu, 7 Nov 1996 11:46:42 -0700 (MST) Cc: ponds!Artisoft.COM!ponds!freebsd.org!dyson, ponds!Artisoft.COM!ponds!freefall.cdrom.com!freebsd-hackers, ponds!Artisoft.COM!ponds!lakes.water.net!rivers, ponds!Artisoft.COM!ponds!lambert.org!terry In-Reply-To: <199611071247.HAA02530@lakes.water.net> from "Thomas David Rivers" at Nov 7, 96 07:47:13 am X-Mailer: ELM [version 2.4 PL24] Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > Well - the jury is "in" - the patch didn't affect my problem. > > This morning, at 7:06am - I got a reboot: > > panic: ffs_valloc: dup alloc > > I seem to recall you mentioning this was an added solution to > some previous changes... could those be required to make this > a better fix? Recall, I'm using 2.1.5-STABLE as of Oct. 17th, > not 2.2. > > Any more avenues to explore? The patch only prevents a condition from occuring instead of panic'ing when it does occur. Ie: it prevents one less error condition that can't be handled from needing to be checked, and then "not handled" (panic). The previous changes didn't affect that specific area of the code, they were a usage workaround designed not to tickle the sensitive race. They were a kludge fix, since the code should be robust in isolation. The "doctor: don't do that" answer isn't applicable when the interfaces are (supposedly) treated as if they were black boxes. What FS's do you have mounted? This is very important. Not all FS's verify the generation count on the vnode like they should, and other nasty bits which may cause your problem as well. Very likely, if David is correct about the proximal cause, it is an FS-specific problem. That it errors out not in the call tree for the FS is a result of the panic check being in the wrong place (list reference instead of list insertion). Please see my recent posting. If David is right about your particular problem, then putting the check in the vrele() like I suggested there will cause a panic at the point of corruption rather than the point of use of corrupted data. This would have the effect of isolating the exact line of code causing your problem, which in the multiple vrele() case is probably an error path that is not frequently used. It would also make the panic condition repeat reliably, for what that's worth (probably a lot, at this point). The vnode->inode->vnode and the inode->vnode->inode integrity checks, like the "ffs_valloc: dup alloc" check, is a post-event check. You will see the corrupt data before you attempt to use it, and panic earlier than you would have, but the proximal cause of the corruption would still not be identifiable from the stack trace. 8-(. Unfortunately, the FreeBSD VFS architecture is not very friendly to FS debugging, and changes to fix it don't look like they will be going in en masse. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.