Date: Mon, 4 Oct 2004 13:49:36 -0400 From: Jim Durham <durham@jcdurham.com> To: freebsd-hackers@freebsd.org Subject: Re: Sudden Reboots Message-ID: <200410041349.36314.durham@jcdurham.com> In-Reply-To: <u3bul054t5qhk962gv11299flubci6hkvf@4ax.com> References: <200409301003.00492.durham@jcdurham.com> <D018B8F5-141D-11D9-B008-0030657EDEB2@attglobal.net> <u3bul054t5qhk962gv11299flubci6hkvf@4ax.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday 02 October 2004 06:42 pm, Mike Tancsa wrote: > On Fri, 1 Oct 2004 21:50:26 -0500, in sentex.lists.freebsd.hackers you > > wrote: > >On Oct 1, 2004, at 7:23 PM, Jim Durham wrote: > >> These are very rare.... except they seem to happen about once a day > >> for a > >> while and then stop... very strange.. > >> > >>> and usually caused by hardware problems (e.g. faulty power supply, > >>> overheating CPU, bad RAM). > >> > >> Possible, but if so, the hardware fixed itself on the first two boxes I > >> mentioned. > > > >All of this can be bad, or not quite bad -- just not healthy -- > >hardware. Say a power supply that can't supply reliable +5, when the > >line voltage drops a tad while all the disks are being hammered. It > >can be a nightmare to figure out. Setup crash dumps, but also make > >sure that the UPS the box is attached to isn't having problems. If > >it's not on conditioned power, fix that. > > Also, a lot of older UPSes do not have any AVR (automatic voltage > regulation). This in conjunction with a marginal power supply can > cause problems like you describe. One of our POPs are in an area that > has seen tremendous residential and industrial growth putting a strain > on the local power. Prior to some major upgrades from the local > utility company, we would see street power dropping below 100V during > peak usage coming from the street and our APCs that have "smart boost" > would all kick in to compensate. Also, the UPS can just be "bad" over > time. > > As others have said, its pretty rare that reboots do not leave a crash > dump behind when its a software issue. At the very least, enable crash > dumps on your machines in question. See the man page for dumpon. At > least this way you can narrow down the odds as to whether or not its > pointing to a hardware or software issue. > > ---Mike I will do that. However, there is something really weird about this after watching it for a few days now that I'd like to tell about.. The reboots started out happening at 5.15 pm or so. I had them unplug the server completely from AC and restart it and now it's happening withing a few minutes of 12:40pm every day. The 'last' command output is the only thing showing anything log-wise. Look at this: reboot ~ Mon Oct 4 12:33 reboot ~ Sun Oct 3 12:37 reboot ~ Sat Oct 2 12:42 reboot ~ Fri Oct 1 12:45 Looks like it's creeping 3 minutes earlier every day. Of course, the fsck time is involved, but probably that is about the same every time. I don't have documentation any more, but the one server I remember noting the time when it was doing this before did it at 5:15 or so every morning. This sure doesn't sound like hardware to me unless it's something to do with the motherboard clock. I can't think of anything in hardware that would cycle like this. I remember having an AM radio transmitter back in my youth that would blow HV rectifiers every day at the same time and we traced it to an industrial plant pulling a breaker on the same line as us, but this server is on a UPS and the time keeps creeping by 3 minutes. Really strange. I will try crashdump. -Jim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410041349.36314.durham>