Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Mar 1995 23:31:35 -0700
From:      nate@sneezy.sri.com (Nate Williams)
To:        hackers@FreeBSD.org, core@FreeBSD.org
Subject:   NMI Error success story
Message-ID:  <199503190631.XAA00912@trout.sri.MT.net>

next in thread | raw e-mail | index | archive | help
I wanted to make absolutely sure that I could say this w/out being
wrong, but my box is now rock solid running FreeBSD 2.X and has got the
snot kicked out of it the last 3 weeks with non-stop make worlds
(multiple make worlds in some cases).

First, some background.  My box was the original 'FreeBSD' development
box (back when it was called the interim release).  However, under heavy
disk loads the machine would get the occasional NMI's which would cause
corruption of in-core binaries or more often disk corruptions.  Terry
Lambert helped me diagnose and relieve some of the problems I was facing
by turning the bus-on time of my Adaptec card way down, which killed
performance but made for a more stable machine.

However, after about 45 minutes of heavy disk load, or 2-3 minutes if I
did a 'rm -rf' on a large tree my box would get an NMI.  I finally got
sick and tired of corrupted FS's one day after I needed it to work
heavily so I kept pushing it hard time and time again that I lost about
10% of the files on my disk which took about 3 weeks to recover from
fully.

Since I couldn't afford any new memory I changed the kernel to panic
instead of reporting the NMI.  This was much better, although I could
predict when my computer crashed almost to the minute if I tried to do
something disk intensive and the nightly cron job started up.

I resolved to live with this until I could afford a new box.  However,
while I was pricing out a new box I couldn't afford, Rod Grimes
convinced me that the problem was motherboard related and sent me a new
motherboard to try out.  Suffice it to say that I somehow screwed up and
I couldn't get the new motherboard to work with all of my old I/O cards.
But, I did notice that the static cache rams on the new motherboard were
15ns, and on my old (buggy) motherboard they were 25ns.  Because I had
nothing to lose and feeling like I couldn't make the problem any worse I
swapped the cache chips on the motherboards.

Since I installed the new faster cache chips on my old motherboard, I
have yet to see a NMI and I've been able to push the new motherboard in
ways that would have made the other one cry.  I was able to install 2.0R
which was completely out of the question on the old motherboard, and
I've been running 2.X since.

Kudos go to the kernel folks who have made it work, and to work well
*given* sufficiently decent hardware.

The moral of the story is, if FreeBSD complains, don't be so quick to
blame the software.  It's much easier to do than to replace broken
hardware, but it won't necessarily fix the problem.

And keep in mind, NMI's in your memory system aren't *necessarily* a
problem with your SIMM's.  It could be a problem with your motherboard
and/or static cache sub-system.  It certainly was with mine.


Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503190631.XAA00912>