Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Apr 2017 13:26:51 +0100
From:      Dave B <g8kbvdave@googlemail.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: 10.3-stable random server reboots - finding the reason
Message-ID:  <e01a013c-7421-6bd4-54ba-84e621a45810@googlemail.com>
In-Reply-To: <mailman.35288.1492540017.4387.freebsd-questions@freebsd.org>
References:  <mailman.35288.1492540017.4387.freebsd-questions@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 18/04/17 19:26, freebsd-questions-request@freebsd.org wrote:
> On 04/18/17 12:59, tech-lists wrote:
>> I have an up-to-date 10.3 server that is randomly rebooting, after being
>> up for days. Previously it had been up for many months. The problem is,
>> nothing seems to be left in the logs to indicate why it's doing this. I
>> have all.log and console.log enabled.
>>
>> So, what I'm asking is, how can I capture its last gasp?
>>

Start by overhauling it and clean any resident colony of dust bunnies
out of it, also out of all the heat sinks/coolers too, and as others
have said, check and clean the fan(s).

Check all the power etc connections are firm and solid.   (Come to that,
are there any annecdotal reports of any electrical disturbances about
the same time as the server dies?  It may be falling rather than jumping
off the edge.)

If its powered by a dedicated UPS, are it's batteries OK?  (After 2
years they should be treated as suspect at best!  Easy to change though
on most UPS's.)

Some cheap (copper over aluminium) SATA cables can die over time too,
causing all sorts of hard disk related wierd mayhem.

Examine the mobo' electrolytic caps (round can's standing vertically) if
any are showing bulging ends(!)  (Or have actually split) or are showing
a brown mess around their base, you'll need to replace them (not trivial
to do!) Or replace the mobo.
(Still a surprisingly common failure, even after the rogue producer of
such items was "sorted out".)
Such failures often manifest themselves as bad RAM!

If you have to start swapping assemblies, start with the power supply,
but *ONLY change one thing at a time* between tests, to be sure you
identify the cause.   And, if you think you found it, swap the last
thing back in, to see if it fails again.

If this is in a "production" environment (other people use it) make a
clone and swap the entire machine out, so you can run diag's on the
suspect machine at your leisure without causing any grief to your users.

Always good to have a clone as a backup for anything serious.

Best Regards.

Dave B.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e01a013c-7421-6bd4-54ba-84e621a45810>