From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 4 17:49:39 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 083C716A4CE for ; Mon, 4 Oct 2004 17:49:39 +0000 (GMT) Received: from w2xo.jcdurham.com (18.gibs5.xdsl.nauticom.net [209.195.184.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7086F43D58 for ; Mon, 4 Oct 2004 17:49:38 +0000 (GMT) (envelope-from durham@jcdurham.com) Received: from tiltdown.pgh.nepinc.com (pgh.nepinc.com [66.207.129.50]) by w2xo.jcdurham.com (8.12.11/8.11.6) with ESMTP id i94HnbJ4097249 for ; Mon, 4 Oct 2004 13:49:37 -0400 (EDT) (envelope-from durham@jcdurham.com) From: Jim Durham To: freebsd-hackers@freebsd.org Date: Mon, 4 Oct 2004 13:49:36 -0400 User-Agent: KMail/1.7 References: <200409301003.00492.durham@jcdurham.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200410041349.36314.durham@jcdurham.com> Subject: Re: Sudden Reboots X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Oct 2004 17:49:39 -0000 On Saturday 02 October 2004 06:42 pm, Mike Tancsa wrote: > On Fri, 1 Oct 2004 21:50:26 -0500, in sentex.lists.freebsd.hackers you > > wrote: > >On Oct 1, 2004, at 7:23 PM, Jim Durham wrote: > >> These are very rare.... except they seem to happen about once a day > >> for a > >> while and then stop... very strange.. > >> > >>> and usually caused by hardware problems (e.g. faulty power supply, > >>> overheating CPU, bad RAM). > >> > >> Possible, but if so, the hardware fixed itself on the first two boxes I > >> mentioned. > > > >All of this can be bad, or not quite bad -- just not healthy -- > >hardware. Say a power supply that can't supply reliable +5, when the > >line voltage drops a tad while all the disks are being hammered. It > >can be a nightmare to figure out. Setup crash dumps, but also make > >sure that the UPS the box is attached to isn't having problems. If > >it's not on conditioned power, fix that. > > Also, a lot of older UPSes do not have any AVR (automatic voltage > regulation). This in conjunction with a marginal power supply can > cause problems like you describe. One of our POPs are in an area that > has seen tremendous residential and industrial growth putting a strain > on the local power. Prior to some major upgrades from the local > utility company, we would see street power dropping below 100V during > peak usage coming from the street and our APCs that have "smart boost" > would all kick in to compensate. Also, the UPS can just be "bad" over > time. > > As others have said, its pretty rare that reboots do not leave a crash > dump behind when its a software issue. At the very least, enable crash > dumps on your machines in question. See the man page for dumpon. At > least this way you can narrow down the odds as to whether or not its > pointing to a hardware or software issue. > > ---Mike I will do that. However, there is something really weird about this after watching it for a few days now that I'd like to tell about.. The reboots started out happening at 5.15 pm or so. I had them unplug the server completely from AC and restart it and now it's happening withing a few minutes of 12:40pm every day. The 'last' command output is the only thing showing anything log-wise. Look at this: reboot ~ Mon Oct 4 12:33 reboot ~ Sun Oct 3 12:37 reboot ~ Sat Oct 2 12:42 reboot ~ Fri Oct 1 12:45 Looks like it's creeping 3 minutes earlier every day. Of course, the fsck time is involved, but probably that is about the same every time. I don't have documentation any more, but the one server I remember noting the time when it was doing this before did it at 5:15 or so every morning. This sure doesn't sound like hardware to me unless it's something to do with the motherboard clock. I can't think of anything in hardware that would cycle like this. I remember having an AM radio transmitter back in my youth that would blow HV rectifiers every day at the same time and we traced it to an industrial plant pulling a breaker on the same line as us, but this server is on a UPS and the time keeps creeping by 3 minutes. Really strange. I will try crashdump. -Jim