From owner-freebsd-stable Wed Sep 5 8:25:32 2001 Delivered-To: freebsd-stable@freebsd.org Received: from users.symmetric.net (centipede.symmetric.net [63.150.23.120]) by hub.freebsd.org (Postfix) with ESMTP id 46DF337B408 for ; Wed, 5 Sep 2001 08:25:28 -0700 (PDT) Received: by users.symmetric.net (Postfix, from userid 1000) id 59E9C1B205; Wed, 5 Sep 2001 10:25:22 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by users.symmetric.net (Postfix) with ESMTP id 5732CD90B for ; Wed, 5 Sep 2001 10:25:22 -0500 (CDT) Date: Wed, 5 Sep 2001 10:25:22 -0500 (CDT) From: presence X-X-Sender: To: Subject: Re: rebooting under load [solved] In-Reply-To: <20010904141119.A34517@mikea.ath.cx> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG After swapping out all memory and going back the the single OEM CPU the problems persisted. I then updated to 4.4-RC from Sep 4, 2001 and the machine stopped crashing. Back in SMP with all original hardware everything seems OK now. The old kernel was 4.3-RELEASE cvsuped from Aug 2, 2001. KEN On Tue, 4 Sep 2001, mikea wrote: > On Tue, Sep 04, 2001 at 01:57:06PM -0500, presence wrote: > > I've got a machine that has been suddenly rebooting. I can make it crash > > at will by bringing the load to about 200 with the script below. My other > > single CPU boxes can handle a this script with 3000 primes instances just > > fine with 512MB of RAM. > > > > Here is what I get whe it goes down. What does it mean? > > > > bash-2.05# panic: vm_fault: fault on nofault entry, addr: cbbd3000 > > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > > boot() called on cpu#1 > > > > syncing disks... 15 > > done > > Uptime: 5m13s > > Automatic reboot in 15 seconds - press a key on the console to abort > > [snip dmesg output] > > > Program that when run twice crashes my machine, even before it seems to > > swap out. > > > > #!/usr/bin/perl > > # > > # Ken Kanno 08-31-2001 > > # I want a load average of 1000 > > # for fun > > > > for($a=0; $a<100; $a++) > > { > > print "starting primes instance # :$a \n"; > > system "nice -20 primes 10000 > /dev/null &"; > > #sleep 1; > > > > } > > Sounds like you might have something (CPU? memory?) just on the > edge of failing, and this load pushes it over the edge by heating > (or driving too fast) the near-failing component. > > What happens when you do a "make -j 16 buildworld"? > > Are all your fans working? Does removing a stick of memory cause > it to _not_ fail? Is your power supply overloaded or marginal? > Are you overclocking the CPUs? If you are, then does it fail at > normal clock rates? Does opening the case and pointing a _big_ fan > at the motherboard change things? > > Just some things to try; no guarantee that these tests will find > the problem, but they might, and they're easy. Others may have > more or better ideas. > > -- > Mike Andrews > mikea@mikea.ath.cx > Tired old sysadmin since 1964 > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message