From owner-freebsd-hackers@FreeBSD.ORG Mon Mar 29 16:16:42 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C11481065679; Mon, 29 Mar 2010 16:16:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 913478FC1C; Mon, 29 Mar 2010 16:16:42 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 41CBD46B8C; Mon, 29 Mar 2010 12:16:42 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 4CFE68A025; Mon, 29 Mar 2010 12:16:41 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Mon, 29 Mar 2010 10:48:44 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201003291048.44861.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 29 Mar 2010 12:16:41 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Masoom Shaikh , freebsd-questions Subject: Re: random FreeBSD panics X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Mar 2010 16:16:42 -0000 On Sunday 28 March 2010 4:28:29 am Masoom Shaikh wrote: > Hello List, > > I was a happy FreeBSD user, just before I installed FreeBSD8.0-RC1. Since > then, system randomly just freezes, and there is no option other than hard > boot. I guessed this will get solved in 8.0-RELEASE, but it was not :( > > Many times I get vmcore files, not always. I have dumpdev set to AUTO in my > rc.conf. Almost every time it just fsck's the file-system on reboot. I have > not lost any files though. This is a Dell Inspiron 1525 Laptop with 1GB ram, > Intel Core2 Duo T5500 with ATI Radeon X1400 card. The installation in > question is KDE4 from ports, with radeon/ati driver. > > I felt the problem is with wpi driver, then suspected dri driver of X. Then > I observed system freezes even if none of this is installed. e.g. if it is > under some load, like building a port and simultaneously fetching something > over network it hangs, and hangs hard. This persuaded me to think something > is wrong in kernel scheduling itself. May be it is lost in some deadlock, > etc... Thus last weekend I thought I would see how immediate previous > version i.e. FreeBSD-7.3-RELEASE would behave. > > I reinstalled FreeBSD7.1 from iso images, svn up'ed FreeBSD7.3 source, did > the normal buildworld, buildkernel, installkernel, installworld cycle. > Unfortunatly this kernel is naughty as well ;-), it also freezes with same > stubbornness. But difference is this time I happen to catch something > interesting. > > It panics on NMI, fatal trap 19 while in kernel mode. Loaded the vmcore file > in kgdb and got the backtrace. I obtained vmcore files on two occasions. I > have attached both the back traces. This error most likely suggests hardware > error in RAM, but Windox7 and XP boot just fine and never caused any errors. Yes, and note that the chipset has set a register to indicate a RAM parity error as well, so it is not a random NMI. Have you checked your BIOS' event log? You may also want to try running with machine checks enabled (hw.mca.enabled=1 in loader.conf, but it would have to be on very recent 7/8- stable) to see if you get machine checks for ECC errors. OTOH, if you do not have ECC memory then this will probably not help. > To verify if I have errors in my RAM I let run sysutils/memtest86+ > overnight, to double verify I also executed Windows Memory Diagnostic test > for four times. None of them reported errors. Can anyone here suggest any > solution. You can still have bad RAM even if those do not fail. -- John Baldwin