Date: Sat, 18 Mar 1995 23:31:35 -0700 From: nate@sneezy.sri.com (Nate Williams) To: hackers@FreeBSD.org, core@FreeBSD.org Subject: NMI Error success story Message-ID: <199503190631.XAA00912@trout.sri.MT.net>
next in thread | raw e-mail | index | archive | help
I wanted to make absolutely sure that I could say this w/out being wrong, but my box is now rock solid running FreeBSD 2.X and has got the snot kicked out of it the last 3 weeks with non-stop make worlds (multiple make worlds in some cases). First, some background. My box was the original 'FreeBSD' development box (back when it was called the interim release). However, under heavy disk loads the machine would get the occasional NMI's which would cause corruption of in-core binaries or more often disk corruptions. Terry Lambert helped me diagnose and relieve some of the problems I was facing by turning the bus-on time of my Adaptec card way down, which killed performance but made for a more stable machine. However, after about 45 minutes of heavy disk load, or 2-3 minutes if I did a 'rm -rf' on a large tree my box would get an NMI. I finally got sick and tired of corrupted FS's one day after I needed it to work heavily so I kept pushing it hard time and time again that I lost about 10% of the files on my disk which took about 3 weeks to recover from fully. Since I couldn't afford any new memory I changed the kernel to panic instead of reporting the NMI. This was much better, although I could predict when my computer crashed almost to the minute if I tried to do something disk intensive and the nightly cron job started up. I resolved to live with this until I could afford a new box. However, while I was pricing out a new box I couldn't afford, Rod Grimes convinced me that the problem was motherboard related and sent me a new motherboard to try out. Suffice it to say that I somehow screwed up and I couldn't get the new motherboard to work with all of my old I/O cards. But, I did notice that the static cache rams on the new motherboard were 15ns, and on my old (buggy) motherboard they were 25ns. Because I had nothing to lose and feeling like I couldn't make the problem any worse I swapped the cache chips on the motherboards. Since I installed the new faster cache chips on my old motherboard, I have yet to see a NMI and I've been able to push the new motherboard in ways that would have made the other one cry. I was able to install 2.0R which was completely out of the question on the old motherboard, and I've been running 2.X since. Kudos go to the kernel folks who have made it work, and to work well *given* sufficiently decent hardware. The moral of the story is, if FreeBSD complains, don't be so quick to blame the software. It's much easier to do than to replace broken hardware, but it won't necessarily fix the problem. And keep in mind, NMI's in your memory system aren't *necessarily* a problem with your SIMM's. It could be a problem with your motherboard and/or static cache sub-system. It certainly was with mine. Nate
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503190631.XAA00912>