From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 2 22:42:00 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A004616A4CF for ; Sat, 2 Oct 2004 22:42:00 +0000 (GMT) Received: from smarthost2.sentex.ca (smarthost2.sentex.ca [205.211.164.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3D37343D41 for ; Sat, 2 Oct 2004 22:42:00 +0000 (GMT) (envelope-from mike@sentex.net) Received: from BLUELAPIS.sentex.ca (cage.simianscience.com [64.7.134.1]) by smarthost2.sentex.ca (8.13.1/8.13.1) with SMTP id i92Mfwrw056135; Sat, 2 Oct 2004 18:41:58 -0400 (EDT) (envelope-from mike@sentex.net) From: Mike Tancsa To: David Scheidt Date: Sat, 02 Oct 2004 18:42:02 -0400 Message-ID: References: <200409301003.00492.durham@jcdurham.com> <20041001223802.GA90717@xor.obsecurity.org> <200410012023.04922.durham@jcdurham.com> In-Reply-To: X-Mailer: Forte Agent 1.93/32.576 English (American) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable cc: freebsd-hackers@freebsd.org Subject: Re: Sudden Reboots X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 22:42:00 -0000 On Fri, 1 Oct 2004 21:50:26 -0500, in sentex.lists.freebsd.hackers you wrote: > >On Oct 1, 2004, at 7:23 PM, Jim Durham wrote: >> These are very rare.... except they seem to happen about once a day=20 >> for a >> while and then stop... very strange.. >> >>> and usually caused by hardware problems (e.g. faulty power supply, >>> overheating CPU, bad RAM). >> >> Possible, but if so, the hardware fixed itself on the first two boxes = I >> mentioned. > >All of this can be bad, or not quite bad -- just not healthy --=20 >hardware. Say a power supply that can't supply reliable +5, when the=20 >line voltage drops a tad while all the disks are being hammered. It=20 >can be a nightmare to figure out. Setup crash dumps, but also make=20 >sure that the UPS the box is attached to isn't having problems. If=20 >it's not on conditioned power, fix that. Also, a lot of older UPSes do not have any AVR (automatic voltage regulation). This in conjunction with a marginal power supply can cause problems like you describe. One of our POPs are in an area that has seen tremendous residential and industrial growth putting a strain on the local power. Prior to some major upgrades from the local utility company, we would see street power dropping below 100V during peak usage coming from the street and our APCs that have "smart boost" would all kick in to compensate. Also, the UPS can just be "bad" over time. As others have said, its pretty rare that reboots do not leave a crash dump behind when its a software issue. At the very least, enable crash dumps on your machines in question. See the man page for dumpon. At least this way you can narrow down the odds as to whether or not its pointing to a hardware or software issue. ---Mike