Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Nov 2002 02:16:33 -0000
From:      "Neil Doody" <neil@mpfreescene.com>
To:        "'Matthew Seaman'" <m.seaman@infracaninophile.co.uk>
Cc:        <kris@obsecurity.org>, <kstewart@owt.com>, <freebsd-questions@freebsd.org>
Subject:   RE: Maybe isolated the signal 12s im getting to hi loads
Message-ID:  <006001c2885f$369329c0$0200a8c0@b1>
In-Reply-To: <20021109205030.GA63058@happy-idiot-talk.infracaninophi>

next in thread | previous in thread | raw e-mail | index | archive | help
Firstly guys thanks all for your feedback.

Just to answer the questions on your minds, im not neglecting the
problem, my main problem is that it's a remote server, but my host has
changed all hardware multiple times, except for the hard drive, though
that was replaced last night [with another Maxtor I may add] and its
still doing it.

The other question, well I generally have been having reboots with this
messages left in the logs :-

Sep 16 21:15:07 admin /kernel: Fatal trap 12: page fault while in kernel
mode
Sep 16 21:15:07 admin /kernel: fault virtual address    = 0x10
Sep 16 21:15:07 admin /kernel: fault code               = supervisor
read, page not present

However I did recently get one of these after the motherdboard cpu and
memory had been changed again [I actually had an upgrade to a faster
cpu] :-

Nov  5 03:12:04 admin /kernel: vop_panic[vop_open]
Nov  5 03:12:04 admin /kernel: panic: Filesystem goof
Nov  5 03:12:04 admin /kernel:

Now after having the hard drive replaced, I have done a fresh install of
FreeBSD[4.6.2] because I inadvertently deleted everything off the old
disk, I tried to do an cvsup and a make world to FreeBSD 4.7.

I havnt been able to do this successfully after numerous attempts,
sometimes the server reboots on the trap 12, but mostly I get these :-

Nov  9 15:24:50 admin /kernel: pid 98573 (cc1), uid 0: exited on signal
11 (core dumped)

So, you was wondering why I wanted to change the make world script, well
to check there wasn't a bug fix in the latest stable tree, I wanted to
complete just the one make world, and as I couldn't do it, I was trying
things to force it on, i.e. starting where it left off, rather than all
over again.

Anyway, I found that the signal 11's would come very close after the
last one, but it took a long time for the first one to occur, so I
figured it down to load averages.

Well I used Ctrl-Z to suspend the process as soon as the 30 min average
counter go near 0.90.

Doing this has allowed me to complete a buildworld successfully.

Now, that brings me to your other theories, something that didn't even
occur to me was over heating, my host is going over to the NOC to check
this out for me, do you know of any heat monitoring tools for FreeBSD ?
Maybe I can do some graphs or something ?

I am quite convinced that it is down to heat, as this is an AMD cpu were
talking about after all [XP2000] and it would coincide with the hi load
averages.


Anyway, thanks all very much for your help, ill keep you posted to my
problem here, its been going on for months now, but I think im getting
closer to the problem ;)



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?006001c2885f$369329c0$0200a8c0>