Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Oct 2004 13:49:36 -0400
From:      Jim Durham <durham@jcdurham.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: Sudden Reboots
Message-ID:  <200410041349.36314.durham@jcdurham.com>
In-Reply-To: <u3bul054t5qhk962gv11299flubci6hkvf@4ax.com>
References:  <200409301003.00492.durham@jcdurham.com> <D018B8F5-141D-11D9-B008-0030657EDEB2@attglobal.net> <u3bul054t5qhk962gv11299flubci6hkvf@4ax.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday 02 October 2004 06:42 pm, Mike Tancsa wrote:
> On Fri, 1 Oct 2004 21:50:26 -0500, in sentex.lists.freebsd.hackers you
>
> wrote:
> >On Oct 1, 2004, at 7:23 PM, Jim Durham wrote:
> >> These are very rare.... except they seem to happen about once a day
> >> for a
> >> while and then stop... very strange..
> >>
> >>> and usually caused by hardware problems (e.g. faulty power supply,
> >>> overheating CPU, bad RAM).
> >>
> >> Possible, but if so, the hardware fixed itself on the first two boxes I
> >> mentioned.
> >
> >All of this can be bad, or not quite bad -- just not healthy --
> >hardware.  Say a power supply that can't supply reliable +5, when the
> >line voltage drops a tad while all the disks are being hammered.  It
> >can be a nightmare to figure out.  Setup crash dumps, but also make
> >sure that the UPS the box is attached to isn't having problems.  If
> >it's not on conditioned power, fix  that.
>
> Also, a lot of older UPSes do not have any AVR (automatic voltage
> regulation).  This in conjunction with a marginal power supply can
> cause problems like you describe.  One of our POPs are in an area that
> has seen tremendous residential and industrial growth putting a strain
> on the local power. Prior to some major upgrades from the local
> utility company, we would see street power dropping below 100V during
> peak usage coming from the street and our APCs that have "smart boost"
> would all kick in to compensate.  Also, the UPS can just be "bad" over
> time.
>
> As others have said, its pretty rare that reboots do not leave a crash
> dump behind when its a software issue. At the very least, enable crash
> dumps on your machines in question. See the man page for dumpon. At
> least this way you can narrow down the odds as to whether or not its
> pointing to a hardware or software issue.
>
>  ---Mike

I will do that.  However, there is something really weird about this after 
watching it for a few days now that I'd like to tell about..

The reboots started out happening at 5.15 pm or so. I had them unplug the 
server completely from AC and restart it and now it's happening withing a few 
minutes of 12:40pm every day.

The 'last' command output is the only thing showing anything log-wise. Look at 
this:


reboot           ~                         Mon Oct  4 12:33

reboot           ~                         Sun Oct  3 12:37

reboot           ~                         Sat Oct  2 12:42

reboot           ~                         Fri Oct  1 12:45


Looks like it's creeping 3 minutes earlier every day. Of course, the fsck time 
is involved, but probably that is about the same every time.

I don't have documentation any more, but the one server I remember noting the 
time when it was doing this before did it at 5:15 or so every morning. 

This sure doesn't sound like hardware to me unless it's something to do with 
the motherboard clock. I can't think of anything in hardware that would cycle 
like this. 

I remember having an AM radio transmitter back in my youth that would blow HV 
rectifiers every day at the same time and we traced it to an industrial plant 
pulling a breaker on the same line as us, but this server is on a UPS and the 
time keeps creeping by 3 minutes.  Really strange.

I will try crashdump.

-Jim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410041349.36314.durham>