From owner-freebsd-questions Tue Oct 10 13:35:37 2000 Delivered-To: freebsd-questions@freebsd.org Received: from mail.wolves.k12.mo.us (mail.wolves.k12.mo.us [207.160.214.1]) by hub.freebsd.org (Postfix) with ESMTP id E2EFC37B503 for ; Tue, 10 Oct 2000 13:35:34 -0700 (PDT) Received: from mail.wolves.k12.mo.us (cdillon@mail.wolves.k12.mo.us [207.160.214.1]) by mail.wolves.k12.mo.us (8.9.3/8.9.3) with ESMTP id PAA83123; Tue, 10 Oct 2000 15:35:27 -0500 (CDT) (envelope-from cdillon@wolves.k12.mo.us) Date: Tue, 10 Oct 2000 15:35:26 -0500 (CDT) From: Chris Dillon To: matthew zeier Cc: freebsd-questions@FreeBSD.ORG Subject: Re: panic: NMI indicates hardware failure In-Reply-To: <20001010080027.A24721@intelenet.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 10 Oct 2000, matthew zeier wrote: > I'm not sure how to understand this error: > > panic: NMI indicates hardware failure > > > It causes the box to reboot. What is NMI? How can I tell what > hardware has failed? How can I fix this? > > I'm running 4.1.1 with GENERIC + IPFilter on an Intel ISP 2150 > with a Mylex 150 RAID controller. An NMI is a Non-Maskable Interrupt. Several things can cause them, always hardware (AFAIK). What has most likely happened is that you are using either EC (error checking) or ECC (error checking and correcting) memory. If just EC, an NMI will occur on any single-bit (the most common) or worse memory error. If ECC is enabled, depending on how thorough the ECC is, it will typically correct single-bit errors transparently and cause an NMI on two-bit errors or larger. Even better ECC memory systems can correct even more bit errors transparently, but I have never seen these in use on PC-type servers or workstations. I would have the memory checked by a professional with a hardware memory tester (software will NOT do) and replaced if need-be. As a purely stop-gap measure, if the memory is ECC, make sure that ECC is enabled in the BIOS and that NMIs are not generated for single-bit error corrections. Some BIOSes let you control NMI behavior and some do not. Do not disable NMIs altogether, however, as a halted system is better than random memory (and thus possibly on-disk data, etc.) corruption, IMHO. Another possible stop-gap measure is to compile the NMI_POWERFAIL option into the FreeBSD kernel, which is typically used with some laptops that generate NMIs on low-battery or power-supply disconnection, and _may_ help to "mask" the NMI you are seeing. Again, it should only be a stop-gap measure since if you are getting an NMI often, you definately have a hardware problem. -- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net FreeBSD: The fastest and most stable server OS on the planet. For Intel x86 and Alpha architectures. ( http://www.freebsd.org ) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message