Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Nov 2009 07:57:40 -0800
From:      Mark Atkinson <atkin901@gmail.com>
To:        freebsd-current@freebsd.org
Subject:   Re: 8.0RC2 amd64 - kernel panic running make buildworld
Message-ID:  <hdhb9l$vqf$1@ger.gmane.org>
In-Reply-To: <4AFC14BE.7020106@icyb.net.ua>
References:  <1031257439203@webmail57.yandex.ru>	<20091105184925.16b55c43@ernst.jennejohn.org>	<31221257446063@webmail71.yandex.ru>	<20091106101943.5a763f43@ernst.jennejohn.org>	<41361257585651@webmail39.yandex.ru>	<20091107115256.3df62bc3@ernst.jennejohn.org>	<1257618758.1511.14.camel@RabbitsDen>	<6511257846119@webmail85.yandex.ru>	<20091110105856.1270038e@ernst.jennejohn.org>	<1257864452.46072.25.camel@RabbitsDen>	<20091110162205.48abcffe@ernst.jennejohn.org>	<4AF99D53.9030005@icyb.net.ua>	<hdc73v$4rt$1@ger.gmane.org>	<941257966918@webmail42.yandex.ru> <hdf5u4$qfr$1@ger.gmane.org> <4AFC14BE.7020106@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Andriy Gapon wrote:
> on 11/11/2009 22:13 Mark Atkinson said the following:
>> Well, you're about at the point I am now with my HP dl385g5, only
>> turning off superpages would result in a successful buildworld.   Mine
>> would often machine check during gas compilation as well.
> 
> Mark,
> 
> you mentioning MCA was magic moment for me.
> I was debugging a problem which seemed to be quite different, but now I think that
> it converges to the problem discussed in this thread (if indeed it's the same
> problem for all reporters).

I'm not sure it's exactly the same.  I only know a couple of things

- my memory tests good.
- turning off superpages allows this machine to function properly.

I suspect there's a problem with one of the following:

- the bios of my machine
- the on die memory controller/intructions of the cpu
- the motherboard electrical interface to memory or bus in some shape or
form.

> Or perhaps there is something like Event Log in BIOS.  Maybe it
> even gets something useful.
> Could you please check?

Yes.  When you receive a MCE on the HP machines the bios notices and
prints a message on the next bootup, something like "an unhandled memory
error has occured since last power on."

In my current job, which works with hardware, we'll occasionally see
MCEs during development.  It's easy to say the memory is bad, and it is
the first thing we replace to test.  However it can also be the
electrical interface to the hardware which may or may not be
fixable/worked around in firmware.  I have also witnessed the software
initializing or controlling the hardware may result in a unhandled
condition spurring an MCE.

> About my problem - it seems that I was working from the opposite end.  I have been
> using head/CURRENT with pg_ps_enabled=1 for quite a while now.  And then I decided
> to try hw.mca.enabled=1 and after that I started having the same symptoms as
> described here.  Unfortunately, I never did get Machine Check trap, it's always
> something that looks like CPU halt and then reset by watchdog (if it is enabled).
> So, for me:
> superpages and no machine check - works
> machine check and no superpages - works
> machine check and superpages - problem

That's not quite the same for sure, definitely try replacing the memory
first if you haven't already.


All the best,


Mark




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hdhb9l$vqf$1>