From owner-freebsd-stable@FreeBSD.ORG  Tue Sep 11 13:44:43 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 502ED1065674
	for <freebsd-stable@freebsd.org>; Tue, 11 Sep 2012 13:44:43 +0000 (UTC)
	(envelope-from niobos@dest-unreach.be)
Received: from serv02.imset.org (hackerspace.be
	[IPv6:2001:41d0:2:1959:fedc:ba98:7654:3210])
	by mx1.freebsd.org (Postfix) with ESMTP id E4C4D8FC0C
	for <freebsd-stable@freebsd.org>; Tue, 11 Sep 2012 13:44:42 +0000 (UTC)
Received: from raptor.rto.be (225.72-136-217.adsl-dyn.isp.belgacom.be
	[217.136.72.225])
	by serv02.imset.org (Postfix) with ESMTPSA id 287DECA049
	for <freebsd-stable@freebsd.org>; Tue, 11 Sep 2012 15:44:42 +0200 (CEST)
Message-ID: <504F4049.9080801@dest-unreach.be>
Date: Tue, 11 Sep 2012 15:44:41 +0200
From: Niobos <niobos@dest-unreach.be>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
	rv:15.0) Gecko/20120824 Thunderbird/15.0
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Kernel panic with geom_multipath + ZFS
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Sep 2012 13:44:43 -0000

Hi,

I'm under the illusion that I've found a bug in the FreeBSD kernel, but
since I'm new to FreeBSD, a quiet voice tells me it's probably a case of
"you're doing it wrong".

Also, I'm not sure if this is the right place to complain. So feel free
to redirect me.

I'll start with some context:

* FreeBSD storage.[...] 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3
07:46:30 UTC 2012
root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

* There are 5 expansion units attached via SAS, daisy-chained. Each unit
has 12 disks, totalling at 60 disks. To provide path redundancy, the
units are connected HBA-1-2-3-4-5 and HBA-5-4-3-2-1.

* I've configured a ZFS on top, with 6 RAID-Z2 arrays of 8+2 disks each.

This setup should be able to survive a disk failure. However, manually
ejecting one of the disks causes a kernel panic. I've manually OCR'd it
below. The panic is not triggered by the ejection itself. I can see that
fact in the kernel log a few seconds after the ejection. I think the
panic is triggered by access to the (now ejected) disk.

>     fault code            = supervisor read data, page not present
>     instruction pointer   = 0x20:0xffffffff807ced68
>     stack pointer         = 0x28:0xffffff80002ecb70
>     frame pointer         = 0x28:0xffffff80002ecbc0
>     code segment          = base 0x0, limit 0xfffff, type 0x1b
>                           = DPL 0, pres 1, long 1, def32 0, gran 1
>     processor eflags      = interrupt enabled, resume, IOPL = 0
>     current process       = 13 (g_down)
>     trap number           = 12
>     panic: page fault
>     cpuid = 0
>     KDB: stack backtrace:
>     #0 0xffffffff808680fe at kdb_backtrace+0x5e
>     #1 0xffffffff80832cb7 at panic+0x184
>     #2 0xffffffff80b18400 at trap_fatal+0x290
>     #3 0xffffffff80b18749 at trap_pfault+0x1f9
>     #4 0xffffffff80b18c0f at trap+0x3df
>     #5 0xffffffff80b0313f at calltrap+0x8
>     #6 0xffffffff80g3f874 at g_io_schedule_down+0x1d4
>     #7 0xffffffff807cfb7c at g_down_procbody+0x5c
>     #8 0xffffffff8080682f at fork_exit+0x11f
>     #9 0xffffffff80b0366e at fork_trampoline+0xe
>     Uptime: 7m16s
>     Automatic reboot in 15 seconds - press a key on the console to abort

So the question is either "what am I doing wrong?" or "can anyone
confirm this is a bug?"

thanks in advance,
Niels


PS: I'm trying to post via email and read via nntp://gmane, I'm not sure
how well this works.