From owner-freebsd-current@FreeBSD.ORG  Sat Jan 31 21:48:57 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 157B51065715
	for <freebsd-current@FreeBSD.ORG>; Sat, 31 Jan 2009 21:48:57 +0000 (UTC)
	(envelope-from dylan@dylex.net)
Received: from datura.dylex.net (datura.dylex.net [216.27.141.80])
	by mx1.freebsd.org (Postfix) with ESMTP id D06598FC08
	for <freebsd-current@FreeBSD.ORG>; Sat, 31 Jan 2009 21:48:56 +0000 (UTC)
	(envelope-from dylan@dylex.net)
Received: from dylan by datura.dylex.net with local (Exim 4.69)
	(envelope-from <dylan@dylex.net>)
	id 1LTNhv-0002NX-HG; Sat, 31 Jan 2009 16:48:55 -0500
Date: Sat, 31 Jan 2009 16:48:55 -0500
From: Dylan Alex Simon <dylan@dylex.net>
To: Christoph Mallon <christoph.mallon@gmx.de>
Message-ID: <20090131214855.GA9123@datura.dylex.net>
References: <8cb6106e0901200641x4b0bda9ag31e6f059f13035a7@mail.gmail.com>
	<200901201829.n0KITE8V072323@lurza.secnetix.de>
	<20090131010855.GA7991@datura.dylex.net> <49844264.7000300@gmx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <49844264.7000300@gmx.de>
Jabber-ID: dylan@dylex.net
Cc: freebsd-current@FreeBSD.ORG
Subject: Re: SATA DMA errors on second ICH10 bus
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jan 2009 21:48:57 -0000

> I suspect I see the same problem with some nvidia SATA controller. If  
> there is high load on both channels of one controller, there are exactly  
> the errors you showed.
> Your kernel does not use INVARIANTS, is this correct? Otherwise you  
> should see a very specific panic caused by a KASSERT(). I analysed the  
> problem a bit. You can see my findings in the thread "Question about  
> panic in brelse()".
> I suspect a hardware bug plus incorrect error handling in the driver in  
> FreeBSD. As a workaround, I suggest you connect each disk to a separate  
> controller - if you have not more disks than controllers.

When I do turn INVARIANTS on I ultimately get a number of different failures,
depending on what sort of operation I'm doing.  I think I've seen the brelse
panic you mentioned but not recently.  Here's one from today doing cp on ufs:

ad0: FAILURE - load data
ad0: setting up DMA failed
g_vfs_done():ad0s1e[READ(offset=1843986432, length=65536)]error = 5
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 819 (cp)
kernel trap 9 with interrupts disabled

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x8:0xffffffff802ae9fe
stack pointer           = 0x10:0xfffffffeb61bfae0
frame pointer           = 0x10:0xfffffffeb61bfb00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 12 (irq14: ata0)
lock order reversal: (Giant after non-sleepable)
 1st 0xffffffff80628750 bio queue (bio queue) @ /usr/src/sys/geom/geom_io.c:68
 2nd 0xffffffff8062b8c0 Giant (Giant) @ /usr/src/sys/dev/kbdmux/kbdmux.c:1044
KDB: stack backtrace:
panic: mutex Giant not owned at /usr/src/sys/kern/tty_ttydisc.c:1127
cpuid = 0

I certainly agree that there's some problems in error handling, but I'm more
concerned about the underlying problem causing the errors.  

:-Dylan