From owner-freebsd-stable  Wed Sep  5  8:25:32 2001
Delivered-To: freebsd-stable@freebsd.org
Received: from users.symmetric.net (centipede.symmetric.net [63.150.23.120])
	by hub.freebsd.org (Postfix) with ESMTP id 46DF337B408
	for <freebsd-stable@FreeBSD.ORG>; Wed,  5 Sep 2001 08:25:28 -0700 (PDT)
Received: by users.symmetric.net (Postfix, from userid 1000)
	id 59E9C1B205; Wed,  5 Sep 2001 10:25:22 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1])
	by users.symmetric.net (Postfix) with ESMTP id 5732CD90B
	for <freebsd-stable@FreeBSD.ORG>; Wed,  5 Sep 2001 10:25:22 -0500 (CDT)
Date: Wed, 5 Sep 2001 10:25:22 -0500 (CDT)
From: presence <kkanno@churchofinformationwarfare.org>
X-X-Sender:  <kkanno@centipede.symmetric.net>
To: <freebsd-stable@FreeBSD.ORG>
Subject: Re: rebooting under load [solved]
In-Reply-To: <20010904141119.A34517@mikea.ath.cx>
Message-ID: <Pine.BSF.4.33L2.0109051024460.372-100000@centipede.symmetric.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

After swapping out all memory and going back the the single OEM CPU the
problems persisted. I then updated to 4.4-RC from Sep 4, 2001 and the
machine stopped crashing. Back in SMP with all original hardware
everything seems OK now.

The old kernel was 4.3-RELEASE cvsuped from Aug 2, 2001.

KEN


On Tue, 4 Sep 2001, mikea wrote:

> On Tue, Sep 04, 2001 at 01:57:06PM -0500, presence wrote:
> > I've got a machine that has been suddenly rebooting. I can make it crash
> > at will by bringing the load to about 200 with the script below. My other
> > single CPU boxes can handle a this script with 3000 primes instances just
> > fine with 512MB of RAM.
> >
> > Here is what I get whe it goes down. What does it mean?
> >
> > bash-2.05# panic: vm_fault: fault on nofault entry, addr: cbbd3000
> > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
> > boot() called on cpu#1
> >
> > syncing disks... 15
> > done
> > Uptime: 5m13s
> > Automatic reboot in 15 seconds - press a key on the console to abort
>
> [snip dmesg output]
>
> > Program that when run twice crashes my machine, even before it seems to
> > swap out.
> >
> > #!/usr/bin/perl
> > #
> > # Ken Kanno 08-31-2001
> > # I want a load average of 1000
> > # for fun
> >
> > for($a=0; $a<100; $a++)
> > {
> >     print "starting primes instance # :$a \n";
> >     system "nice -20 primes 10000 > /dev/null &";
> >     #sleep 1;
> >
> > }
>
> Sounds like you might have something (CPU? memory?) just on the
> edge of failing, and this load pushes it over the edge by heating
> (or driving too fast) the near-failing component.
>
> What happens when you do a "make -j 16 buildworld"?
>
> Are all your fans working? Does removing a stick of memory cause
> it to _not_ fail? Is your power supply overloaded or marginal?
> Are you overclocking the CPUs? If you are, then does it fail at
> normal clock rates? Does opening the case and pointing a _big_ fan
> at the motherboard change things?
>
> Just some things to try; no guarantee that these tests will find
> the problem, but they might, and they're easy. Others may have
> more or better ideas.
>
> --
> Mike Andrews
> mikea@mikea.ath.cx
> Tired old sysadmin since 1964
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-stable" in the body of the message
>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message