Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Oct 2000 15:35:26 -0500 (CDT)
From:      Chris Dillon <cdillon@wolves.k12.mo.us>
To:        matthew zeier <matthew@intelenet.net>
Cc:        freebsd-questions@FreeBSD.ORG
Subject:   Re: panic: NMI indicates hardware failure
Message-ID:  <Pine.BSF.4.21.0010101519560.81765-100000@mail.wolves.k12.mo.us>
In-Reply-To: <20001010080027.A24721@intelenet.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 10 Oct 2000, matthew zeier wrote:

> I'm not sure how to understand this error: 
> 
>         panic: NMI indicates hardware failure
> 
> 
> It causes the box to reboot.  What is NMI?  How can I tell what
> hardware has failed?  How can I fix this?
> 
> I'm running 4.1.1 with GENERIC + IPFilter on an Intel ISP 2150
> with a Mylex 150 RAID controller.

An NMI is a Non-Maskable Interrupt.  Several things can cause them,
always hardware (AFAIK).  What has most likely happened is that you
are using either EC (error checking) or ECC (error checking and
correcting) memory.  If just EC, an NMI will occur on any single-bit
(the most common) or worse memory error.  If ECC is enabled, depending
on how thorough the ECC is, it will typically correct single-bit
errors transparently and cause an NMI on two-bit errors or larger.  
Even better ECC memory systems can correct even more bit errors
transparently, but I have never seen these in use on PC-type servers
or workstations.

I would have the memory checked by a professional with a hardware
memory tester (software will NOT do) and replaced if need-be.  As a
purely stop-gap measure, if the memory is ECC, make sure that ECC is
enabled in the BIOS and that NMIs are not generated for single-bit
error corrections.  Some BIOSes let you control NMI behavior and some
do not.  Do not disable NMIs altogether, however, as a halted system
is better than random memory (and thus possibly on-disk data, etc.)
corruption, IMHO.

Another possible stop-gap measure is to compile the NMI_POWERFAIL
option into the FreeBSD kernel, which is typically used with some
laptops that generate NMIs on low-battery or power-supply
disconnection, and _may_ help to "mask" the NMI you are seeing.  
Again, it should only be a stop-gap measure since if you are getting
an NMI often, you definately have a hardware problem.


-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net
   FreeBSD: The fastest and most stable server OS on the planet.
   For Intel x86 and Alpha architectures. ( http://www.freebsd.org )




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0010101519560.81765-100000>