Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Sep 2010 17:16:33 +0930
From:      "Daniel O'Connor" <doconnor@gsoft.com.au>
To:        Andriy Gapon <avg@icyb.net.ua>
Cc:        freebsd-stable Stable <freebsd-stable@freebsd.org>, "John H. Baldwin" <jhb@freebsd.org>
Subject:   Re: Enabling MCA causes system hangs
Message-ID:  <2D9E9363-3F81-4288-8788-8429CBAF6E53@gsoft.com.au>
In-Reply-To: <4C89DE32.9050402@icyb.net.ua>
References:  <E0962551-398E-49C7-BFB6-496AB58B2779@gsoft.com.au> <4C89DE32.9050402@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

On 10/09/2010, at 16:58, Andriy Gapon wrote:
>> The motherboard is a Gigabyte GA-MA785GM-US2H with an Athlon II X2 =
240 CPU &
>> 4Gb of RAM.
>=20
> Do you also have superpages enabled (vm.pmap.pg_ps_enabled)?
> If so, please try to turn them off and report back if that helps.

Yes, they are - I will try without.


> If not, then it's a tougher situation.
> What you see looks like a consequence of HyperTransport sync flood, =
which is a
> way to handle certain errors detected by CPU.  Essentially it means =
that all
> HyperTransport communications are frozen.  A system just hangs.
>=20
> My impression is that consumer-type systems are often configured to =
produce sync
> flood to stop error propagation in situations where more 'serious' =
systems would
> report machine check exception (MCE), probably an uncorrectable one.

Ahh..
The system does seem to operate normally without MCA and I haven't =
noticed any data corruption issues. FWIW I am using ZFS on this box and =
haven't seen any complaints about corrupt files.

> NOTE: the following may hurt your system and your data!
> Please stop reading if you are unsure if you can handle that!

Woooh, sounds fun :)

> You may try to investigate the sync flood situation further by =
checking the
> following bit in CPU configuration:
>=20
> F3x180 Extended NB MCA Configuration Register
> 21 SyncFloodOnCpuLeakErr: sync flood on CPU leak error enable.
>=20
> You can examine current value with a command like the following:
> $ pciconf -r pci0:0:24:3 0x180
>=20
> Where pci0:0:24:3 is PCI handle that corresponds to the device =
reported as
> follows by pciconf -lv:
> '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control'
>=20
> If the bit is set, you can try to flip it off (using pciconf -w) and =
see how
> your system behaves when the MCA condition strikes.

It does look like it is set:

hostb4@pci0:0:24:3:     class=3D0x060000 card=3D0x00000000 =
chip=3D0x12031022 rev=3D0x00 hdr=3D0x00
    vendor     =3D 'Advanced Micro Devices (AMD)'
    device     =3D '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous =
Control'
    class      =3D bridge
    subclass   =3D HOST-PCI

[midget 17:09] /usr/src/sys >sudo pciconf -r pci0:0:24:3 0x180
00f003e2=20

Which is..
   0    0    f    0    0    3    e    2
0000 0000 1111 0000 0000 0011 1110 0010
|    |    |    |    |    |    |    |  |
31   27   23   19   15   11   7    3  0

> Be careful and cautious.

Thanks, I'll let you know how I go!

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2D9E9363-3F81-4288-8788-8429CBAF6E53>