Date: Tue, 4 Sep 2001 14:11:19 -0500 From: mikea <mikea@mikea.ath.cx> To: freebsd-stable@FreeBSD.ORG Subject: Re: rebooting under load Message-ID: <20010904141119.A34517@mikea.ath.cx> In-Reply-To: <Pine.BSF.4.33L2.0109041353120.372-100000@centipede.symmetric.net>; from kkanno@churchofinformationwarfare.org on Tue, Sep 04, 2001 at 01:57:06PM -0500 References: <20010904081017.B48472@xor.obsecurity.org> <Pine.BSF.4.33L2.0109041353120.372-100000@centipede.symmetric.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 04, 2001 at 01:57:06PM -0500, presence wrote: > I've got a machine that has been suddenly rebooting. I can make it crash > at will by bringing the load to about 200 with the script below. My other > single CPU boxes can handle a this script with 3000 primes instances just > fine with 512MB of RAM. > > Here is what I get whe it goes down. What does it mean? > > bash-2.05# panic: vm_fault: fault on nofault entry, addr: cbbd3000 > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > boot() called on cpu#1 > > syncing disks... 15 > done > Uptime: 5m13s > Automatic reboot in 15 seconds - press a key on the console to abort [snip dmesg output] > Program that when run twice crashes my machine, even before it seems to > swap out. > > #!/usr/bin/perl > # > # Ken Kanno 08-31-2001 > # I want a load average of 1000 > # for fun > > for($a=0; $a<100; $a++) > { > print "starting primes instance # :$a \n"; > system "nice -20 primes 10000 > /dev/null &"; > #sleep 1; > > } Sounds like you might have something (CPU? memory?) just on the edge of failing, and this load pushes it over the edge by heating (or driving too fast) the near-failing component. What happens when you do a "make -j 16 buildworld"? Are all your fans working? Does removing a stick of memory cause it to _not_ fail? Is your power supply overloaded or marginal? Are you overclocking the CPUs? If you are, then does it fail at normal clock rates? Does opening the case and pointing a _big_ fan at the motherboard change things? Just some things to try; no guarantee that these tests will find the problem, but they might, and they're easy. Others may have more or better ideas. -- Mike Andrews mikea@mikea.ath.cx Tired old sysadmin since 1964 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010904141119.A34517>