Date: Thu, 12 Nov 2009 07:57:40 -0800 From: Mark Atkinson <atkin901@gmail.com> To: freebsd-current@freebsd.org Subject: Re: 8.0RC2 amd64 - kernel panic running make buildworld Message-ID: <hdhb9l$vqf$1@ger.gmane.org> In-Reply-To: <4AFC14BE.7020106@icyb.net.ua> References: <1031257439203@webmail57.yandex.ru> <20091105184925.16b55c43@ernst.jennejohn.org> <31221257446063@webmail71.yandex.ru> <20091106101943.5a763f43@ernst.jennejohn.org> <41361257585651@webmail39.yandex.ru> <20091107115256.3df62bc3@ernst.jennejohn.org> <1257618758.1511.14.camel@RabbitsDen> <6511257846119@webmail85.yandex.ru> <20091110105856.1270038e@ernst.jennejohn.org> <1257864452.46072.25.camel@RabbitsDen> <20091110162205.48abcffe@ernst.jennejohn.org> <4AF99D53.9030005@icyb.net.ua> <hdc73v$4rt$1@ger.gmane.org> <941257966918@webmail42.yandex.ru> <hdf5u4$qfr$1@ger.gmane.org> <4AFC14BE.7020106@icyb.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Andriy Gapon wrote: > on 11/11/2009 22:13 Mark Atkinson said the following: >> Well, you're about at the point I am now with my HP dl385g5, only >> turning off superpages would result in a successful buildworld. Mine >> would often machine check during gas compilation as well. > > Mark, > > you mentioning MCA was magic moment for me. > I was debugging a problem which seemed to be quite different, but now I think that > it converges to the problem discussed in this thread (if indeed it's the same > problem for all reporters). I'm not sure it's exactly the same. I only know a couple of things - my memory tests good. - turning off superpages allows this machine to function properly. I suspect there's a problem with one of the following: - the bios of my machine - the on die memory controller/intructions of the cpu - the motherboard electrical interface to memory or bus in some shape or form. > Or perhaps there is something like Event Log in BIOS. Maybe it > even gets something useful. > Could you please check? Yes. When you receive a MCE on the HP machines the bios notices and prints a message on the next bootup, something like "an unhandled memory error has occured since last power on." In my current job, which works with hardware, we'll occasionally see MCEs during development. It's easy to say the memory is bad, and it is the first thing we replace to test. However it can also be the electrical interface to the hardware which may or may not be fixable/worked around in firmware. I have also witnessed the software initializing or controlling the hardware may result in a unhandled condition spurring an MCE. > About my problem - it seems that I was working from the opposite end. I have been > using head/CURRENT with pg_ps_enabled=1 for quite a while now. And then I decided > to try hw.mca.enabled=1 and after that I started having the same symptoms as > described here. Unfortunately, I never did get Machine Check trap, it's always > something that looks like CPU halt and then reset by watchdog (if it is enabled). > So, for me: > superpages and no machine check - works > machine check and no superpages - works > machine check and superpages - problem That's not quite the same for sure, definitely try replacing the memory first if you haven't already. All the best, Mark
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hdhb9l$vqf$1>