Date: Fri, 13 Nov 2009 09:49:08 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Kai Gallasch <gallasch@free.de> Subject: Re: 8.0RC2 amd64 - kernel panic running make buildworld Message-ID: <200911130949.09190.jhb@freebsd.org> In-Reply-To: <20091112195932.5875387e@orwell.free.de> References: <1031257439203@webmail57.yandex.ru> <200911111504.14906.jhb@freebsd.org> <20091112195932.5875387e@orwell.free.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 12 November 2009 1:59:32 pm Kai Gallasch wrote: > Am Wed, 11 Nov 2009 15:04:14 -0500 > schrieb John Baldwin <jhb@freebsd.org>: > > > On Wednesday 11 November 2009 2:15:18 pm S.N.Grigoriev wrote: > > > > > > 10.11.09, 09:15, "Mark Atkinson" <atkin901@yahoo.com> > > > wrote: > > > > > > > Andriy Gapon wrote: > > > > > on 10/11/2009 17:22 gary.jennejohn@freenet.de said the > > > > > following: > > > > > > Not a trivial issue unless it is hardware indeed. > > > > > > > > > Also, you can try adding: > > > > hw.mca.enabled="1" in /boot/loader.conf, reboot, and then see if > > > > there is a machine check exception on the console during the > > > > buildworld. > > > > > > Mark, > > > > > > I've added hw.mca.enabled="1" in /boot/loader.conf and got the > > > following screen during the buildworld: > > > > > > ..... > > > -c /usr/src/gnu/usr.bin/binutils/as/../../../../contrib/binutils/gas/sb.c > > > > > > MCA: CPU3 UNCOR PCC OVER DTLIB L1 error > > > MCA: Address 0x8015fb000 > > > > You hardware is broken and it is telling you so. You have had > > multiple machine checks with the most severe one being an > > uncorrectable error in your data TLB (i.e. in the CPU itself). > > John, > > I also set hw.mca.enabled="1" and vm.pmap.pg_ps_enabled="1" > in /boot/loader.conf on my (under load) spontaneously rebooting > opteron proliant server. > > Server was upgraded to FREEBSD-8.0-PRERELEASE today. > > This is what happened.. > > > ---- machine check trap, first run ---- > > sonnenkraft:/usr/obj # MCA: CPU 5 UNCOR PCC OVER DTLB L1 error > MCA: Address 0x80e5c8000 Hmm, normally I would suspect the CPU, but avg@ has been looking at the fact that there may be some sort of interaction with the superpages code and the machine check registers on AMD CPUs (either a CPU bug, or perhaps a superpages bug). I would wait to see if he finds something. An isolated MCA would most likely indicate a hardware error, but the fact that several people are reporting this exact machine check but only when superpages is enabled indicates it might be something else. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200911130949.09190.jhb>