From owner-freebsd-hardware@FreeBSD.ORG Fri Jun 25 17:03:49 2010 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 509C11065672 for ; Fri, 25 Jun 2010 17:03:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 242E78FC17 for ; Fri, 25 Jun 2010 17:03:49 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 8816A46B0C; Fri, 25 Jun 2010 13:03:48 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7CC868A03C; Fri, 25 Jun 2010 13:03:47 -0400 (EDT) From: John Baldwin To: Ireneusz Pluta Date: Fri, 25 Jun 2010 13:03:41 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4C2499B5.3030404@wp.pl> <201006250842.22551.jhb@freebsd.org> <4C24D0AF.5000307@wp.pl> In-Reply-To: <4C24D0AF.5000307@wp.pl> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201006251303.41457.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 25 Jun 2010 13:03:47 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hardware@freebsd.org Subject: Re: System hangs during heavy sequential write to mfi device X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jun 2010 17:03:49 -0000 On Friday 25 June 2010 11:52:15 am Ireneusz Pluta wrote: > John Baldwin pisze: > > On Friday 25 June 2010 7:57:41 am Ireneusz Pluta wrote: > > > >> Hello, > >> > >> I already posted this to freebsd-questions, with no response, so far. > >> As this is rather a problem closer to hardware issues, so maybe here I > >> have a better luck. Hope nobody blames me for crossposting. > >> > >> Jun 22 15:09:21 emu kernel: N NISAM NIM I II3SS0A,A N NE3MIMS0II3, > >> N 0I ,ENSIMA NSI IEAMASI MI A IS3S A00 0, > >> Jun 22 15:09:21 emu kernel: I > >> Jun 22 15:09:21 emu kernel: A33 > >> > >> > > You are getting NMI's. Have you tried checking the system's event log (if it > > has one) for messages about hardware errors? > > > > > > Thank you, John, for paying attention. > > The mb is intel S5520HC so it has event log. > From your article > http://www.bsdcan.org/2008/schedule/attachments/45_article.pdf I found > that I yet needed impitool, so I installed it. > > `impitool sel elist` > > shows quite a lot of messages already stored like these: > > e05 | 06/24/2010 | 18:08:24 | Critical Interrupt PCIe Fat Sensor | > e06 | 06/24/2010 | 18:09:28 | Critical Interrupt PCIe Fat Sensor | > e07 | 06/25/2010 | 16:12:56 | Critical Interrupt PCIe Fat Sensor | > e08 | 06/25/2010 | 17:34:16 | Critical Interrupt PCIe Fat Sensor | > e09 | 06/25/2010 | 17:34:55 | Critical Interrupt PCIe Fat Sensor | > > They seem to appear exactly at the moments of system lockups - the last > two were appended after I made another try. > > Intel document related to the motherboard > http://download.intel.com/support/motherboards/server/sb/e68105004_msu_s5520hc_s5500hcv_july_09.pdf > says something about these messages, but in the context completely > unrelated to my case. > > Any thoughts? Hmmm. You might have a hardware issue. OTOH, you can try seeing if you have a BIOS option to disable PCIE error logging. If so, turning it off may fix your hangs since as the act of writing the entry to the system event log can take several hundred milliseconds on some machines causing a hang of sorts. If you still get hangs with the option turned off then it seems you have some sort of hardware issue, possibly in the mfi(4) adapter or mainboard. -- John Baldwin