Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 08 Oct 2008 08:30:12 +0200
From:      Mister Olli <mister.olli@googlemail.com>
To:        Jerry McAllister <jerrymc@msu.edu>
Cc:        Jeremy Chadwick <koitsu@freebsd.org>, freebsd-questions@freebsd.org
Subject:   Re: analyzing freebsd core dumps
Message-ID:  <1223447412.5896.9.camel@phoenix.blechhirn.net>
In-Reply-To: <20081006174502.GB71024@gizmo.acns.msu.edu>
References:  <1223273047.23248.25.camel@phoenix.blechhirn.net> <20081006171809.GA26368@icarus.home.lan> <20081006174502.GB71024@gizmo.acns.msu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
hi...

thanks for the feedback on this topic.
the first step to clean the machine and check all connectors has been
done yesterday. I hope that this will fix the problem, and that it's not
some kind of hardware failure.

to run tests with memtest is quite a problem, since the machine has high
availability requirements. to take it off for nearly one hour for
cleaning and checking during daily work of our company was a pain.
6 hours or more of RAM tests is not possible.

is there some other way to detect hardware failure with less time
consuming tool/ process?

greetz
olli

Am Montag, den 06.10.2008, 13:45 -0400 schrieb Jerry McAllister:
> On Mon, Oct 06, 2008 at 10:18:09AM -0700, Jeremy Chadwick wrote:
> 
> > On Mon, Oct 06, 2008 at 08:04:07AM +0200, Mister Olli wrote:
> > > hi list...
> > > 
> > > I have a freebsd maschine running for more 6 months without any
> > > problems.
> > > the machine's only service is to be an openvpn gateway for a hand of
> > > users.
> > > 
> > > 2 weeks ago the first problems started. the openvpn exited with signal
> > > 11 and 4 and core dumps were written.
> > > 
> > > the same happend yesterday with the postfix/cleanup process, and the
> > > suddenly the machine rebooted without any further log messages.
> > > 
> > > what is the best way to troubleshoot the cause of this problem?
> > 
> > Signal 11 happening "out of no where" on machines which have been
> > running fine, most of the time, is a sign of hardware failure (usually
> > RAM, but sometimes motherboard or PSU).  The fact you got a reboot is
> > also further evidence of this.
> > 
> > http://www.freebsd.org/doc/en/books/faq/troubleshoot.html#SIGNAL11
> > 
> > I would recommend taking the machine offline and running something like
> > memtest86+ on it for 6-7 hours.  Any errors seen are a pretty good sign
> > that you should replace the memory or the motherboard.  You can
> > download an ISO or floppy disk images here:
> > 
> > http://www.memtest.org/
> > 
> > Bottom line is that this is probably a hardware issue.
> 
> Could also be a contacts if it is not the actual memory or board.
> A marginal contact where something is plugged in can over time
> build up deposits that make it fail.   Of course, this is still
> a hardware problem, but can often be cured by reseating everything.
> If it is bad enough, it could also be exacerbated by reseating 
> everything.
> 
> ////jerry
> 
> > 
> > -- 
> > | Jeremy Chadwick                                jdc at parodius.com |
> > | Parodius Networking                       http://www.parodius.com/ |
> > | UNIX Systems Administrator                  Mountain View, CA, USA |
> > | Making life hard for others since 1977.              PGP: 4BD6C0CB |
> > 
> > _______________________________________________
> > freebsd-questions@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1223447412.5896.9.camel>