From owner-freebsd-stable@FreeBSD.ORG Tue Jul 5 15:41:31 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4608D106566B for ; Tue, 5 Jul 2011 15:41:30 +0000 (UTC) (envelope-from freebsd-stable@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id CAF7D8FC13 for ; Tue, 5 Jul 2011 15:41:29 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Qe7kb-0006oK-Vl for freebsd-stable@freebsd.org; Tue, 05 Jul 2011 17:41:25 +0200 Received: from dtmd-4db2d4d7.pool.mediaways.net ([77.178.212.215]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 05 Jul 2011 17:41:25 +0200 Received: from christian.baer by dtmd-4db2d4d7.pool.mediaways.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 05 Jul 2011 17:41:25 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-stable@freebsd.org From: Christian Baer Date: Tue, 05 Jul 2011 17:41:11 +0200 Lines: 89 Message-ID: References: <52F39CE0-EEC7-4180-8186-BF8696AF279D@lassitu.de> <20110618175215.GA18645@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: dtmd-4db2d4d7.pool.mediaways.net User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.16) Gecko/20101125 Lightning/1.0b1 Thunderbird/3.0.11 In-Reply-To: <20110618175215.GA18645@icarus.home.lan> Subject: Re: Crashes with Promise controller X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2011 15:41:31 -0000 On 18.06.2011 19:52, Jeremy Chadwick wrote: > It may be that the kernel is panic'ing and auto-rebooting before he can > see the message in question. I would advocate he put the following > directives in his kernel configuration and rebuild/reinstall kernel and > wait for it to happen again. I have now changed the power setup slightly and the problems have *reduced* and slightly changed in themselves. Reproducing a panic is a lot harder, which I consider a good thing at the moment. Since I changed the power configuration, the system has been running for about 4 days and had only two crashes (traps) since then, despite quite heavy traffic on the drives. Because the system rebooted very quickly before I set up the serial console, I only ever got to see one panic (not a trap) in the past. But it was gone to quickly for me to write anything down about it. On a side-note: I did find out during my testing (before changing the power) that two drives were actually causing the problems and I could even make the system crash while only reading from one of those drives. Crashes while reading felt less frequent (no statistics collected though) but happened just the same. Because I formatted the two drives in question with rather strange values (rather large block sizes), I have decided to copy everything off them, re-partition them with gpt and create both the encryption-system on them aswell as the file system over. During this copying, I managed to crash the system twice. The first time was yesterday, where I got this: --- snip --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x1f8 fault code = supervisor read, page not present instruction pointer = 0x20:0xc3d2120c stack pointer = 0x28:0xc3697bf4 frame pointer = 0x28:0xc3697c4c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread pid 2 tid 100007 ] Stopped at g_eli_access+0x7c: testl $0x10008,0x1f8(%ebx) --- snap --- About 25 minutes ago, the system crashed again. This time, I had the "known" errors prior to the actual trap: --- snip --- ata6: SIGNATURE: ffffffff ata6: timeout waiting to issue command ata6: error issuing SETFEATURES SET TRANSFER MODE command ata6: timeout waiting to issue command ata6: error issuing SETFEATURES ENABLE RCACHE command ata6: timeout waiting to issue command ata6: error issuing SETFEATURES ENABLE WCACHE command ata6: timeout waiting to issue command ata6: error issuing SET_MULTI command ad12: FAILURE - device detached GEOM_ELI: g_eli_read_done() failed ad12d.eli[READ(offset=403810975744, length=32768)] g_vfs_done():ad12d.eli[READ(offset=403810975744, length=32768)]error = 6 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x1f8 fault code = supervisor read, page not present instruction pointer = 0x20:0xc3d2420c stack pointer = 0x28:0xc3697bf4 frame pointer = 0x28:0xc3697c4c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread pid 2 tid 100007 ] Stopped at g_eli_access+0x7c: testl $0x10008,0x1f8(%ebx) --- snap --- The strange thing is that I wasn't actually accessing ad12 at the time. I was running a "-t long" on it, but no more. That test had been running for over two hours at the time of the crash. Does this still somehow point to a power problem (since ad12 seems to get detached)? Or could is be something a bit more fundamental? Best regards, Chris