From owner-freebsd-stable@FreeBSD.ORG Fri Sep 10 07:46:47 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A1B5106564A; Fri, 10 Sep 2010 07:46:47 +0000 (UTC) (envelope-from doconnor@gsoft.com.au) Received: from cain.gsoft.com.au (cain.gsoft.com.au [203.31.81.10]) by mx1.freebsd.org (Postfix) with ESMTP id A69DA8FC0C; Fri, 10 Sep 2010 07:46:46 +0000 (UTC) Received: from ur.gsoft.com.au (Ur.gsoft.com.au [203.31.81.44]) (authenticated bits=0) by cain.gsoft.com.au (8.14.4/8.14.3) with ESMTP id o8A7kXvD063414 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 10 Sep 2010 17:16:38 +0930 (CST) (envelope-from doconnor@gsoft.com.au) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Daniel O'Connor" In-Reply-To: <4C89DE32.9050402@icyb.net.ua> Date: Fri, 10 Sep 2010 17:16:33 +0930 Content-Transfer-Encoding: quoted-printable Message-Id: <2D9E9363-3F81-4288-8788-8429CBAF6E53@gsoft.com.au> References: <4C89DE32.9050402@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -2.51 () ALL_TRUSTED,BAYES_00,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 203.31.81.10 Cc: freebsd-stable Stable , "John H. Baldwin" Subject: Re: Enabling MCA causes system hangs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Sep 2010 07:46:47 -0000 On 10/09/2010, at 16:58, Andriy Gapon wrote: >> The motherboard is a Gigabyte GA-MA785GM-US2H with an Athlon II X2 = 240 CPU & >> 4Gb of RAM. >=20 > Do you also have superpages enabled (vm.pmap.pg_ps_enabled)? > If so, please try to turn them off and report back if that helps. Yes, they are - I will try without. > If not, then it's a tougher situation. > What you see looks like a consequence of HyperTransport sync flood, = which is a > way to handle certain errors detected by CPU. Essentially it means = that all > HyperTransport communications are frozen. A system just hangs. >=20 > My impression is that consumer-type systems are often configured to = produce sync > flood to stop error propagation in situations where more 'serious' = systems would > report machine check exception (MCE), probably an uncorrectable one. Ahh.. The system does seem to operate normally without MCA and I haven't = noticed any data corruption issues. FWIW I am using ZFS on this box and = haven't seen any complaints about corrupt files. > NOTE: the following may hurt your system and your data! > Please stop reading if you are unsure if you can handle that! Woooh, sounds fun :) > You may try to investigate the sync flood situation further by = checking the > following bit in CPU configuration: >=20 > F3x180 Extended NB MCA Configuration Register > 21 SyncFloodOnCpuLeakErr: sync flood on CPU leak error enable. >=20 > You can examine current value with a command like the following: > $ pciconf -r pci0:0:24:3 0x180 >=20 > Where pci0:0:24:3 is PCI handle that corresponds to the device = reported as > follows by pciconf -lv: > '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control' >=20 > If the bit is set, you can try to flip it off (using pciconf -w) and = see how > your system behaves when the MCA condition strikes. It does look like it is set: hostb4@pci0:0:24:3: class=3D0x060000 card=3D0x00000000 = chip=3D0x12031022 rev=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices (AMD)' device =3D '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous = Control' class =3D bridge subclass =3D HOST-PCI [midget 17:09] /usr/src/sys >sudo pciconf -r pci0:0:24:3 0x180 00f003e2=20 Which is.. 0 0 f 0 0 3 e 2 0000 0000 1111 0000 0000 0011 1110 0010 | | | | | | | | | 31 27 23 19 15 11 7 3 0 > Be careful and cautious. Thanks, I'll let you know how I go! -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C