Date: Sat, 2 Dec 95 11:59 WET From: uhclem%nemesis@fw.ast.com (Frank Durda IV) To: bugs@freebsd.org Subject: Mission Impossible-style crashes on 2.1.0 Message-ID: <m0tLwDO-000C0aC@nemesis.lonestar.org>
next in thread | raw e-mail | index | archive | help
I have several systems (6) that I recently upgraded to 2.1.0-RELEASE. These systems all ran 2.0.5 previously, and had no crashes in at least the previous 90 days (very nice). However, in the seven days since upgrading to 2.1.0, the same hardware has experienced an average of 10 unexpected reboots, or just over one a day. The systems with more load (news vs no news) crash more often. Usually these crashes happen when I am not around, so all I find is login: prompts and no clues to the cause. /var/log/messages simply shows that the system was running its normal stuff and then it was booting. Finally it happened in front of me on one of the systems. The system was reasonably idle (tind was active and I was in vi writing and no X), when the console keyboard went dead. All the keyboard lights turned off (NUM LOCK is normally on) and disk activity stopped. I tried changing screens and hitting a few keys like NUM LOCK, but no action. I then unplugged and reconnected the keyboard. That didn't help either. Then around 15 seconds after the system went dead, the screen cleared and the system rebooted. This has happened twice when I was at the console, and the above actions were taken during one of those two events. All residual clues suggest that this is the same failure that occurs when I am away. (All systems are protected by UPS power, so this isn't a power thing and they are in locations over a 30-mile area.) Since there is no visible panic and nothing in the logs, I am looking for suggestions on how to investigate this problem. I don't think a triple-fault is occurring, because the system reset would occur instantly. The odd thing is the 15-second delay is consistent, like it is deliberate. (I wonder if a panic is occurring and not being displayed for some reason and the 15-second timer for press a key to avoid a reboot is running.) (On one system the system froze as above, but stayed there until I reset it manually several hours later. Again, all keyboard lights were off and the logs had nothing useful. Since this only happened once on one machine, I'll treat this as a different issue for now.) On two of the systems where I can't put up with this number of crashes, the 2.0.5 kernel is now being run. After two days, those systems no longer crash. These systems are a mix of 486DX/SX-33/25/100 and Pentium 75/90, all with SCSI, some with IDE+SCSI, no-CD-ROMs at the moment, between 8 and 16Meg of RAM. All have WD/SMC Ether cards (usually 8013EW) and the drivers are active. Anyway, if anyone has seen something like this, or has suggestions on how to get more useful information from the system when this happens, I would appreciate it. (I do have access to port 80 and port 300 debuggers if that will help.) I really don't want to have to run the 2.0.5 kernel on 2.1.0 systems. Thanks! Frank Durda IV <uhclem@nemesis.lonestar.org>|"The Knights who say "LETNi" or uhclem%nemesis@fw.ast.com (Fastest Route)| demand... A SEGMENT REGISTER!!!" ...letni!rwsys!nemesis!uhclem |"A what?" ...decvax!fw.ast.com!nemesis!uhclem |"LETNi! LETNi! LETNi!" - 1983
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m0tLwDO-000C0aC>