Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Mar 2007 23:02:54 +0100 (BST)
From:      Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>
To:        =?ISO-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>
Cc:        stable@freebsd.org
Subject:   Re: Weird messages output
Message-ID:  <20070327224208.E64587@ury.york.ac.uk>
In-Reply-To: <2B277B33-A56F-4E77-9E57-7F4777B22D2F@anduin.net>
References:  <1D425A90-3619-48B7-8171-13ECC9A31087@anduin.net> <1175002427.44767.41.camel@buffy.york.ac.uk> <2B277B33-A56F-4E77-9E57-7F4777B22D2F@anduin.net>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-86407754-1175032974=:64587
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Tue, 27 Mar 2007, Eirik =D8verby wrote:
> On 27. mar. 2007, at 15.33, Gavin Atkinson wrote:
>> On Tue, 2007-03-27 at 15:00 +0200, Eirik =D8verby wrote:
>>> Hi all,
>>>=20
>>> running 6.1-RELEASE on several HP DL385 servers (identically
>>> configured), one of them has recently spat the following out in the /
>>> var/log/messages file:
>>>=20
>>> ..........
>>> Mar 10 03:51:24 apphost02 ntpd[445]: kernel time sync enabled 2001
>>> Mar 10 05:02:01 apphost02 kernel: NMI ISA 30, EISA ff
>>> ..........
>>=20
>> I suspect you'll find your (ECC) memory has problems.
>
> You are absolutely correct. Further investigation using the ProLiant=20
> management tools for FreeBSD revealed serious RAM trouble. Two banks were=
=20
> degraded, so we have now had the modules replaced on-site.

Glad to be of help!

> Thanks for the tip!
> Do you happen to know if there are any "generic" tools/daemons available =
to=20
> decipher such NMIs? Perhaps be able to send SNMP traps or something?

I don't, to be honest.  There is some code in /usr/src/sys/i386/isa/nmi.c=
=20
that tries to detect the cause of an NMI, although I don't remember ever=20
seeing the messages when a parity error was detected.  I guess it's=20
possible that (to some chipset vendor at least) 0x20 and 0x30 indicate=20
parity error, but neither our code or Linux's (see=20
http://fxr.watson.org/fxr/source/arch/i386/kernel/traps.c?v=3Dlinux-2.6#L74=
3 )
know those codes to mean parity error.

Gavin
--0-86407754-1175032974=:64587--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070327224208.E64587>