Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Sep 2010 10:28:50 +0300
From:      Andriy Gapon <avg@icyb.net.ua>
To:        "Daniel O'Connor" <doconnor@gsoft.com.au>
Cc:        freebsd-stable Stable <freebsd-stable@freebsd.org>, "John H. Baldwin" <jhb@freebsd.org>
Subject:   Re: Enabling MCA causes system hangs
Message-ID:  <4C89DE32.9050402@icyb.net.ua>
In-Reply-To: <E0962551-398E-49C7-BFB6-496AB58B2779@gsoft.com.au>
References:  <E0962551-398E-49C7-BFB6-496AB58B2779@gsoft.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
on 10/09/2010 05:36 Daniel O'Connor said the following:
> Hi, I recently tried enabling MCA (ie w.mca.enabled=1 in loader.conf) on an
> 8.0-STABLE system and found that it would cause the system to hang after a
> few minutes of uptime.
> 
> The screen would go black and the monitor would turn off (regardless of
> wether it was in X or not) and only a hard reset would bring it back.
> 
> Also I found that quite often I had to power cycle the whole PC or the BIOS
> wouldn't detect the hard disks on boot(!) after a hang.
> 
> uname is.. FreeBSD midget.dons.net.au 8.0-STABLE FreeBSD 8.0-STABLE #6
> r202903M: Sun Jan 24 13:45:11 CST 2010
> darius@midget.dons.net.au:/usr/obj/usr/src/sys/MIDGET  amd64
> 
> The motherboard is a Gigabyte GA-MA785GM-US2H with an Athlon II X2 240 CPU &
> 4Gb of RAM.

Do you also have superpages enabled (vm.pmap.pg_ps_enabled)?
If so, please try to turn them off and report back if that helps.

If not, then it's a tougher situation.
What you see looks like a consequence of HyperTransport sync flood, which is a
way to handle certain errors detected by CPU.  Essentially it means that all
HyperTransport communications are frozen.  A system just hangs.

My impression is that consumer-type systems are often configured to produce sync
flood to stop error propagation in situations where more 'serious' systems would
report machine check exception (MCE), probably an uncorrectable one.

NOTE: the following may hurt your system and your data!
Please stop reading if you are unsure if you can handle that!

You may try to investigate the sync flood situation further by checking the
following bit in CPU configuration:

F3x180 Extended NB MCA Configuration Register
21 SyncFloodOnCpuLeakErr: sync flood on CPU leak error enable.

You can examine current value with a command like the following:
$ pciconf -r pci0:0:24:3 0x180

Where pci0:0:24:3 is PCI handle that corresponds to the device reported as
follows by pciconf -lv:
'(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control'

If the bit is set, you can try to flip it off (using pciconf -w) and see how
your system behaves when the MCA condition strikes.

Be careful and cautious.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C89DE32.9050402>