Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Feb 2006 16:48:50 -0500
From:      "Carroll Kong" <me@carrollkong.com>
To:        "'Peter Jeremy'" <peterjeremy@optushome.com.au>
Cc:        hackers@freebsd.org
Subject:   RE: FreeBSD 4.11 P13 Crash
Message-ID:  <20060228214851.6F46D43D46@mx1.FreeBSD.org>
In-Reply-To: <20060228183049.GB689@turion.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I've ordered a new CPU and power supply already.  After installing those
parts, I hope the problem "goes" away.  I would probably bet it's more
likely the power as someone else already mentioned that's a big culprit.

If it still fails after those two changes, then I can consider the
downgrade.  I figured my setup can't be that unusual so someone else would
have run into this issue if it was indeed a software bug.  Furthermore, I am
biased towards FreeBSD servers.  They just aren't buggy beasts by nature!
:)

I don't think it is cooling since the system's temperature is somewhat the
same.  I'll take it into consideration though as anything is possible at
this point.

Thanks for the other tips and notes.  It's good to have some solid answers!



- Carroll Kong 

> -----Original Message-----
> From: Peter Jeremy [mailto:peterjeremy@optushome.com.au] 
> Sent: Tuesday, February 28, 2006 1:31 PM
> To: Carroll Kong
> Cc: hackers@freebsd.org
> Subject: Re: FreeBSD 4.11 P13 Crash
> 
> On Mon, 2006-Feb-27 20:52:57 -0500, Carroll Kong wrote:
> >Okay this time my kernel was recompiled so there are no 
> modules to make 
> >it easier to see all of the symbols.
> 
> If you cd to your kernel build directory (eg 
> /usr/src/sys/compile/DAEMON) and run 'make gdbinit' and then 
> use kgdb in that directory, there are a number of functions 
> to let you load KLD symbols.
> 
> >Sometimes the box cycles through the fatal traps 12.  Other times it 
> >does not.
> ...
> >This box was stable before I upgraded from 4.9->4.11.
> 
> It's always possible that you've hit a software bug.  Would 
> it be practical to downgrade to your 4.9 configuration and 
> see if the problem goes away?
> [Note that ths does not totally rule out hardware as the 
> changed memory footprint may reveal a hardware problem].
> 
> >I have since swapped the RAM, motherboard, RAM again (I 
> bought another 
> >stick thinking maybe my new RAM was coincidentally bugged), 
> one of the 
> >Intel NICs, and my 3Ware controller.  The problem still occurred and 
> >actually more frequently.  The usual frequency was about 14 
> days or so.  
> >It just crashed in less than 23 hours and then again within 
> 25 minutes.
> 
> Assuming a similar system load[*], this does suggest failing hardware.
> 
> My suspicions would be system cooling or PSU.  Your P4 should 
> just throttle back if it gets too warm but other parts of 
> your system (RAM, northbridge, southbridge etc) may start 
> mis-behaving if they get too warm.
> 
> >- PowerSupply (I suppose anything is possible, please note 
> it is on an 
> >APC UPS, but the power supply might be delivering bad juice?)
> 
> I'd put this as the likely culprit - consumer-grade PSUs are 
> not conservatively rated and modern systems put quite a 
> strain on the power supplies (in terms of very high dI/dt loads).
> 
> >year in the past.  As a note, the problem is NOT load related.  In 
> >fact, one time the fatal panic said the running process was 
> "idle".  :)
> 
> [*] A corrupted word in memory can sit around for a 
> relatively long time before something de-references it.  A 
> lot of packet handing code exists at interrupt level and so 
> will only trigger when a packet arrives - even if the system 
> is otherwise idle.
> 
> --
> Peter Jeremy




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060228214851.6F46D43D46>