Date: Wed, 08 Oct 2008 08:30:12 +0200 From: Mister Olli <mister.olli@googlemail.com> To: Jerry McAllister <jerrymc@msu.edu> Cc: Jeremy Chadwick <koitsu@freebsd.org>, freebsd-questions@freebsd.org Subject: Re: analyzing freebsd core dumps Message-ID: <1223447412.5896.9.camel@phoenix.blechhirn.net> In-Reply-To: <20081006174502.GB71024@gizmo.acns.msu.edu> References: <1223273047.23248.25.camel@phoenix.blechhirn.net> <20081006171809.GA26368@icarus.home.lan> <20081006174502.GB71024@gizmo.acns.msu.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
hi... thanks for the feedback on this topic. the first step to clean the machine and check all connectors has been done yesterday. I hope that this will fix the problem, and that it's not some kind of hardware failure. to run tests with memtest is quite a problem, since the machine has high availability requirements. to take it off for nearly one hour for cleaning and checking during daily work of our company was a pain. 6 hours or more of RAM tests is not possible. is there some other way to detect hardware failure with less time consuming tool/ process? greetz olli Am Montag, den 06.10.2008, 13:45 -0400 schrieb Jerry McAllister: > On Mon, Oct 06, 2008 at 10:18:09AM -0700, Jeremy Chadwick wrote: > > > On Mon, Oct 06, 2008 at 08:04:07AM +0200, Mister Olli wrote: > > > hi list... > > > > > > I have a freebsd maschine running for more 6 months without any > > > problems. > > > the machine's only service is to be an openvpn gateway for a hand of > > > users. > > > > > > 2 weeks ago the first problems started. the openvpn exited with signal > > > 11 and 4 and core dumps were written. > > > > > > the same happend yesterday with the postfix/cleanup process, and the > > > suddenly the machine rebooted without any further log messages. > > > > > > what is the best way to troubleshoot the cause of this problem? > > > > Signal 11 happening "out of no where" on machines which have been > > running fine, most of the time, is a sign of hardware failure (usually > > RAM, but sometimes motherboard or PSU). The fact you got a reboot is > > also further evidence of this. > > > > http://www.freebsd.org/doc/en/books/faq/troubleshoot.html#SIGNAL11 > > > > I would recommend taking the machine offline and running something like > > memtest86+ on it for 6-7 hours. Any errors seen are a pretty good sign > > that you should replace the memory or the motherboard. You can > > download an ISO or floppy disk images here: > > > > http://www.memtest.org/ > > > > Bottom line is that this is probably a hardware issue. > > Could also be a contacts if it is not the actual memory or board. > A marginal contact where something is plugged in can over time > build up deposits that make it fail. Of course, this is still > a hardware problem, but can often be cured by reseating everything. > If it is bad enough, it could also be exacerbated by reseating > everything. > > ////jerry > > > > > -- > > | Jeremy Chadwick jdc at parodius.com | > > | Parodius Networking http://www.parodius.com/ | > > | UNIX Systems Administrator Mountain View, CA, USA | > > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > > _______________________________________________ > > freebsd-questions@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1223447412.5896.9.camel>