Date: Thu, 12 Nov 2009 04:59:03 -0800 From: David Wolfskill <david@catwhisker.org> To: Peter Jeremy <peter@vk2pj.dyndns.org> Cc: hardware@freebsd.org Subject: Re: 7.2-STABLE i386 box crashing -- clues? Message-ID: <20091112125903.GA1631@albert.catwhisker.org> In-Reply-To: <20091112062708.GA16648@server.vk2pj.dyndns.org> References: <20091111173747.GA1150@albert.catwhisker.org> <20091112062708.GA16648@server.vk2pj.dyndns.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--jI8keyz6grp/JLjh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 12, 2009 at 05:27:09PM +1100, Peter Jeremy wrote: > I can't offer any solutions but I have some more questions... I appreciate the help! > ... > >Every once in a while, it just crashes -- hard. It loses video output > >at that point; Ctl+Alt+Esc doesn't appear to change anything; entering > >(say) "reset" blindly at that point has no apparent effect. >=20 > Roughly how often? For the current month: albert(7.2-S)[8] last reboot shutdown reboot ~ Thu Nov 12 03:04 reboot ~ Wed Nov 11 20:06 reboot ~ Wed Nov 11 14:42 shutdown ~ Wed Nov 11 14:40 reboot ~ Wed Nov 11 14:35 reboot ~ Wed Nov 11 10:05 reboot ~ Wed Nov 11 09:09 reboot ~ Wed Nov 11 04:25 reboot ~ Tue Nov 10 12:49 reboot ~ Mon Nov 9 14:52 reboot ~ Sun Nov 8 17:42 reboot ~ Sat Nov 7 04:22 reboot ~ Fri Nov 6 21:43 reboot ~ Fri Nov 6 19:00 reboot ~ Fri Nov 6 16:20 shutdown ~ Fri Nov 6 16:17 reboot ~ Fri Nov 6 16:03 reboot ~ Fri Nov 6 13:07 reboot ~ Fri Nov 6 09:46 reboot ~ Thu Nov 5 16:41 reboot ~ Thu Nov 5 13:32 reboot ~ Thu Nov 5 12:59 reboot ~ Thu Nov 5 10:17 reboot ~ Thu Nov 5 04:26 reboot ~ Wed Nov 4 20:32 reboot ~ Wed Nov 4 15:48 reboot ~ Wed Nov 4 10:37 reboot ~ Tue Nov 3 13:15 reboot ~ Tue Nov 3 10:55 reboot ~ Tue Nov 3 04:16 reboot ~ Mon Nov 2 18:13 reboot ~ Sun Nov 1 20:03 shutdown ~ Sun Nov 1 20:01 reboot ~ Sun Nov 1 17:10 reboot ~ Sun Nov 1 13:51 shutdown ~ Sun Nov 1 13:48 wtmp begins Sun Nov 1 05:08:18 PST 2009 albert(7.2-S)[9]=20 The "solo reboots" are crashes; those paired with "shutdown" entries are controlled. > Has anything unusual happened lately? Brownout, blackout, power surge, > lightning, heatwave, ... Nothing linked to the crashes. I pulled the UPS out of service some weeks ago because it needs new batteries; I need to get those ordered. But the crashes were happening before that, in any case. > >accordingly, had attached a SCSI host adaptor via PCI riser card. Since > >I had nothing actually connected to the card, I pulled it out of the > >machine before bringing it back up. >=20 > Did you also pull the riser card? Riser cards don't have a spectacularly > high reputation. That's actually what I pulled. The SCSI card itself is still physically in the chassis, merely with an air gap between itself at the system board (because the riser card is now in a closet). > > (I also fleft around for > >excessively warm spots; nothing. All fans spin up, as well.) >=20 > I don't suppose you also studied the capacitors on the motherboard. > Are any showing any signs of bulges? I'll take another look for those; I recall that electrolytics exhibit that as a sign of failure -- thanks for the reminder. > Have you tried reseating everything? The memory, yeah (even before replacing it); also swapped the DIMMs. Only other thing that can be re-seated (desktop system board, so most everything is built-in) would be the CPU, and I'm not quite sure how that heat sink works. I did re-seat some power connectors. > >Flaky CPU? Flaky power supply? How might I tell? >=20 > CPU shouldn't go flaky unless it's been overheated. In my experience, > PSUs are the least reliable part of consumer-grade hardware but about > the only way to check is to swap it. :-} > If you've got a DMM, you could check all the rails but there are > lots of failure modes that won't show up that way. Yeah, I kinda figured that. I do have a DMM (used to have a VTVM), but figured the meter wouldn't show transient dips or whatever too well. > Have you checked the voltage/temperature screen in the BIOS? Does > anything look abnormal? Did a couple of reality checks in that way as detours during some of the reboots. Nothing interesting there at all. (And I have seen a case in the past -- though with a 1U box) where that test definitely showed something wrong (CPU temp climbing about 1C every 30 seconds, IIRC). > Are you using a PS/2 or USB keyboard? PS/2 via KVM. I don't have any USB keyboarda. :-} > Are you running X? Yes; the machine is configured to start xdm on transition to multi--user, as my spouse used to use it as a desktop. (She's gone back to using its predecessor, a 4.11-STABLE machine, in frustration.) > At this stage, my suggestion would be to try swapping the PSU. Thanks. I'll discuss it with the "family CFO." Peace, david --=20 David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --jI8keyz6grp/JLjh Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkr8BpUACgkQmprOCmdXAD0yeQCfZmK6zwOTfDdQ2TIdjf9Df8QU G1MAnR81BXl85TGJIbjQ21LZqBHoFOin =QGTk -----END PGP SIGNATURE----- --jI8keyz6grp/JLjh--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091112125903.GA1631>