From owner-freebsd-hackers Thu Nov 7 19:20:47 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA02886 for hackers-outgoing; Thu, 7 Nov 1996 19:20:47 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id TAA02876 for ; Thu, 7 Nov 1996 19:20:35 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA21036; Thu, 7 Nov 1996 22:20:04 -0500 Received: from dg-rtp by dg-rtp.dg.com.rtp.dg.com; Thu, 7 Nov 1996 22:20 EST Received: from dg-rtp.UUCP (uucp@localhost) by ponds.water.net (8.7.5/8.7.3) with UUCP id WAA07496 for freefall.cdrom.com!freebsd-hackers; Thu, 7 Nov 1996 22:07:37 -0500 (EST) Received: from reggae.ncren.net by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA01440; Thu, 7 Nov 1996 21:39:47 -0500 Received: from mcnc.UUCP by reggae.ncren.net (5.65/tas-reggae/may94) id AA07556; Thu, 7 Nov 96 21:25:58 -0500 Received: from ncnoc.ncren.net by reggae.ncren.net (5.65/tas-reggae/may94) id AA07519; Thu, 7 Nov 96 21:22:14 -0500 Received: from stingray.mcnc.org (stingray.mcnc.org [128.109.130.74]) by ncnoc.ncren.net (8.7.4/8.7.3) with ESMTP id VAA03745 for ; Thu, 7 Nov 1996 21:21:49 -0500 (EST) Received: from relay3.UU.NET by stingray.mcnc.org (8.7.5/MCNC/8-10-92) id VAA15815; Thu, 7 Nov 1996 21:20:27 -0500 (EST) for Received: from coyote.Artisoft.COM by relay3.UU.NET with ESMTP (peer crosschecked as: coyote.Artisoft.COM [198.17.250.162]) id QQbown11557; Thu, 7 Nov 1996 21:21:46 -0500 (EST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by coyote.Artisoft.COM (8.7.6/8.7.3) with SMTP id TAA15237 for ; Thu, 7 Nov 1996 19:20:42 -0700 (MST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA20660; Thu, 7 Nov 1996 21:20:07 -0500 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Thu, 7 Nov 1996 21:20 EST Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.7.5/8.7.3) with ESMTP id UAA06648; Thu, 7 Nov 1996 20:42:41 -0500 (EST) Received: (from rivers@localhost) by lakes.water.net (8.7.5/8.6.9) id UAA03593; Thu, 7 Nov 1996 20:43:45 -0500 (EST) Date: Thu, 7 Nov 1996 20:43:45 -0500 (EST) From: Thomas David Rivers Message-Id: <199611080143.UAA03593@lakes.water.net> To: ponds!lambert.org!terry, ponds!uunet.uu.net!ponds!ponds!rivers Subject: Re: More info on the daily panics... Cc: ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!freebsd.org!dyson, ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!freefall.cdrom.com!freebsd-hackers, ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!lakes.water.net!rivers, ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!lambert.org!terry Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > > Well - the jury is "in" - the patch didn't affect my problem. > > > > This morning, at 7:06am - I got a reboot: > > > > panic: ffs_valloc: dup alloc > > > > I seem to recall you mentioning this was an added solution to > > some previous changes... could those be required to make this > > a better fix? Recall, I'm using 2.1.5-STABLE as of Oct. 17th, > > not 2.2. > > > > Any more avenues to explore? > > The patch only prevents a condition from occuring instead of panic'ing > when it does occur. Ie: it prevents one less error condition that can't > be handled from needing to be checked, and then "not handled" (panic). Ok, then we can deduce that my panic was not from this condition, right? (Since this condition can no longer occur, and I still get the panic...) > > The previous changes didn't affect that specific area of the code, they > were a usage workaround designed not to tickle the sensitive race. They > were a kludge fix, since the code should be robust in isolation. The > "doctor: don't do that" answer isn't applicable when the interfaces are > (supposedly) treated as if they were black boxes. > > > What FS's do you have mounted? This is very important. Not all FS's > verify the generation count on the vnode like they should, and other > nasty bits which may cause your problem as well. Very likely, if David > is correct about the proximal cause, it is an FS-specific problem. That > it errors out not in the call tree for the FS is a result of the panic > check being in the wrong place (list reference instead of list insertion). Hmmm... it seems to error-out in the ffs code (ffs_valloc) - isn't that specific to ffs? Anyway, to answer your question; I have ufs and nfs file systems mounted, my /etc/fstab: /dev/wd0s1b none swap sw 0 0 /dev/wd0a / ufs rw 1 1 /dev/wd0s1e /usr ufs rw 1 1 proc /proc procfs rw 0 0 lakes:/disk1 /disk1 nfs rw 0 0 lakes:/disk1/usr /disk1/usr nfs rw 0 0 lakes:/usr/X11R6 /usr/X11R6 nfs rw 0 0 nothing else is mounted... > > > Please see my recent posting. If David is right about your particular > problem, then putting the check in the vrele() like I suggested there > will cause a panic at the point of corruption rather than the point > of use of corrupted data. Ok - that's done... > > This would have the effect of isolating the exact line of code causing > your problem, which in the multiple vrele() case is probably an error > path that is not frequently used. > > It would also make the panic condition repeat reliably, for what that's > worth (probably a lot, at this point). Yes - it would be helpful - particularly since I have to wait 'n' days for this to occur, with 'n' usually less than 3... The problem almost always occurs at 3:00 AM, but sometimes it occurs at 1:13pm... [I couldn't relate any particular item started by cron(1) that caused this...] > > > The vnode->inode->vnode and the inode->vnode->inode integrity checks, > like the "ffs_valloc: dup alloc" check, is a post-event check. You > will see the corrupt data before you attempt to use it, and panic > earlier than you would have, but the proximal cause of the corruption > would still not be identifiable from the stack trace. 8-(. Well - at least would have incontravertable evidence of the corruption... If I don't trip over the vrele() check, and still get the panic; we can look elsewhere... Just let me add, up front, that I appreciate everyone's effort on this! - Dave R. -