Date: Fri, 10 Sep 2010 17:16:33 +0930 From: "Daniel O'Connor" <doconnor@gsoft.com.au> To: Andriy Gapon <avg@icyb.net.ua> Cc: freebsd-stable Stable <freebsd-stable@freebsd.org>, "John H. Baldwin" <jhb@freebsd.org> Subject: Re: Enabling MCA causes system hangs Message-ID: <2D9E9363-3F81-4288-8788-8429CBAF6E53@gsoft.com.au> In-Reply-To: <4C89DE32.9050402@icyb.net.ua> References: <E0962551-398E-49C7-BFB6-496AB58B2779@gsoft.com.au> <4C89DE32.9050402@icyb.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/09/2010, at 16:58, Andriy Gapon wrote: >> The motherboard is a Gigabyte GA-MA785GM-US2H with an Athlon II X2 = 240 CPU & >> 4Gb of RAM. >=20 > Do you also have superpages enabled (vm.pmap.pg_ps_enabled)? > If so, please try to turn them off and report back if that helps. Yes, they are - I will try without. > If not, then it's a tougher situation. > What you see looks like a consequence of HyperTransport sync flood, = which is a > way to handle certain errors detected by CPU. Essentially it means = that all > HyperTransport communications are frozen. A system just hangs. >=20 > My impression is that consumer-type systems are often configured to = produce sync > flood to stop error propagation in situations where more 'serious' = systems would > report machine check exception (MCE), probably an uncorrectable one. Ahh.. The system does seem to operate normally without MCA and I haven't = noticed any data corruption issues. FWIW I am using ZFS on this box and = haven't seen any complaints about corrupt files. > NOTE: the following may hurt your system and your data! > Please stop reading if you are unsure if you can handle that! Woooh, sounds fun :) > You may try to investigate the sync flood situation further by = checking the > following bit in CPU configuration: >=20 > F3x180 Extended NB MCA Configuration Register > 21 SyncFloodOnCpuLeakErr: sync flood on CPU leak error enable. >=20 > You can examine current value with a command like the following: > $ pciconf -r pci0:0:24:3 0x180 >=20 > Where pci0:0:24:3 is PCI handle that corresponds to the device = reported as > follows by pciconf -lv: > '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control' >=20 > If the bit is set, you can try to flip it off (using pciconf -w) and = see how > your system behaves when the MCA condition strikes. It does look like it is set: hostb4@pci0:0:24:3: class=3D0x060000 card=3D0x00000000 = chip=3D0x12031022 rev=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices (AMD)' device =3D '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous = Control' class =3D bridge subclass =3D HOST-PCI [midget 17:09] /usr/src/sys >sudo pciconf -r pci0:0:24:3 0x180 00f003e2=20 Which is.. 0 0 f 0 0 3 e 2 0000 0000 1111 0000 0000 0011 1110 0010 | | | | | | | | | 31 27 23 19 15 11 7 3 0 > Be careful and cautious. Thanks, I'll let you know how I go! -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2D9E9363-3F81-4288-8788-8429CBAF6E53>