Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Sep 2001 14:11:19 -0500
From:      mikea <mikea@mikea.ath.cx>
To:        freebsd-stable@FreeBSD.ORG
Subject:   Re: rebooting under load
Message-ID:  <20010904141119.A34517@mikea.ath.cx>
In-Reply-To: <Pine.BSF.4.33L2.0109041353120.372-100000@centipede.symmetric.net>; from kkanno@churchofinformationwarfare.org on Tue, Sep 04, 2001 at 01:57:06PM -0500
References:  <20010904081017.B48472@xor.obsecurity.org> <Pine.BSF.4.33L2.0109041353120.372-100000@centipede.symmetric.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 04, 2001 at 01:57:06PM -0500, presence wrote:
> I've got a machine that has been suddenly rebooting. I can make it crash
> at will by bringing the load to about 200 with the script below. My other
> single CPU boxes can handle a this script with 3000 primes instances just
> fine with 512MB of RAM.
> 
> Here is what I get whe it goes down. What does it mean?
> 
> bash-2.05# panic: vm_fault: fault on nofault entry, addr: cbbd3000
> mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
> boot() called on cpu#1
> 
> syncing disks... 15
> done
> Uptime: 5m13s
> Automatic reboot in 15 seconds - press a key on the console to abort
 
[snip dmesg output]
 
> Program that when run twice crashes my machine, even before it seems to
> swap out.
> 
> #!/usr/bin/perl
> #
> # Ken Kanno 08-31-2001
> # I want a load average of 1000
> # for fun
> 
> for($a=0; $a<100; $a++)
> {
>     print "starting primes instance # :$a \n";
>     system "nice -20 primes 10000 > /dev/null &";
>     #sleep 1;
> 
> }

Sounds like you might have something (CPU? memory?) just on the
edge of failing, and this load pushes it over the edge by heating 
(or driving too fast) the near-failing component. 

What happens when you do a "make -j 16 buildworld"?

Are all your fans working? Does removing a stick of memory cause
it to _not_ fail? Is your power supply overloaded or marginal?
Are you overclocking the CPUs? If you are, then does it fail at
normal clock rates? Does opening the case and pointing a _big_ fan
at the motherboard change things?

Just some things to try; no guarantee that these tests will find
the problem, but they might, and they're easy. Others may have 
more or better ideas.

-- 
Mike Andrews
mikea@mikea.ath.cx
Tired old sysadmin since 1964

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010904141119.A34517>