Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Mar 2006 05:30:49 +1100
From:      Peter Jeremy <peterjeremy@optushome.com.au>
To:        Carroll Kong <me@carrollkong.com>
Cc:        hackers@freebsd.org
Subject:   Re: FreeBSD 4.11 P13 Crash
Message-ID:  <20060228183049.GB689@turion.vk2pj.dyndns.org>
In-Reply-To: <20060228015258.2220543D48@mx1.FreeBSD.org>
References:  <20060228015258.2220543D48@mx1.FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2006-Feb-27 20:52:57 -0500, Carroll Kong wrote:
>Okay this time my kernel was recompiled so there are no modules to make it
>easier to see all of the symbols.

If you cd to your kernel build directory (eg /usr/src/sys/compile/DAEMON)
and run 'make gdbinit' and then use kgdb in that directory, there are
a number of functions to let you load KLD symbols.

>Sometimes the box cycles through the fatal traps 12.  Other times it does
>not.
...
>This box was stable before I upgraded from 4.9->4.11.

It's always possible that you've hit a software bug.  Would it be practical
to downgrade to your 4.9 configuration and see if the problem goes away?
[Note that ths does not totally rule out hardware as the changed memory
footprint may reveal a hardware problem].

>I have since swapped the RAM, motherboard, RAM again (I bought another stick
>thinking maybe my new RAM was coincidentally bugged), one of the Intel NICs,
>and my 3Ware controller.  The problem still occurred and actually more
>frequently.  The usual frequency was about 14 days or so.  It just crashed
>in less than 23 hours and then again within 25 minutes.

Assuming a similar system load[*], this does suggest failing hardware.

My suspicions would be system cooling or PSU.  Your P4 should just
throttle back if it gets too warm but other parts of your system (RAM,
northbridge, southbridge etc) may start mis-behaving if they get too
warm.

>- PowerSupply (I suppose anything is possible, please note it is on an APC
>UPS, but the power supply might be delivering bad juice?)

I'd put this as the likely culprit - consumer-grade PSUs are not
conservatively rated and modern systems put quite a strain on the
power supplies (in terms of very high dI/dt loads).

>year in the past.  As a note, the problem is NOT load related.  In fact, one
>time the fatal panic said the running process was "idle".  :)

[*] A corrupted word in memory can sit around for a relatively long time
before something de-references it.  A lot of packet handing code exists
at interrupt level and so will only trigger when a packet arrives - even
if the system is otherwise idle.

-- 
Peter Jeremy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060228183049.GB689>