Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Sep 2012 15:44:41 +0200
From:      Niobos <niobos@dest-unreach.be>
To:        freebsd-stable@freebsd.org
Subject:   Kernel panic with geom_multipath + ZFS
Message-ID:  <504F4049.9080801@dest-unreach.be>

next in thread | raw e-mail | index | archive | help
Hi,

I'm under the illusion that I've found a bug in the FreeBSD kernel, but
since I'm new to FreeBSD, a quiet voice tells me it's probably a case of
"you're doing it wrong".

Also, I'm not sure if this is the right place to complain. So feel free
to redirect me.

I'll start with some context:

* FreeBSD storage.[...] 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3
07:46:30 UTC 2012
root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

* There are 5 expansion units attached via SAS, daisy-chained. Each unit
has 12 disks, totalling at 60 disks. To provide path redundancy, the
units are connected HBA-1-2-3-4-5 and HBA-5-4-3-2-1.

* I've configured a ZFS on top, with 6 RAID-Z2 arrays of 8+2 disks each.

This setup should be able to survive a disk failure. However, manually
ejecting one of the disks causes a kernel panic. I've manually OCR'd it
below. The panic is not triggered by the ejection itself. I can see that
fact in the kernel log a few seconds after the ejection. I think the
panic is triggered by access to the (now ejected) disk.

>     fault code            = supervisor read data, page not present
>     instruction pointer   = 0x20:0xffffffff807ced68
>     stack pointer         = 0x28:0xffffff80002ecb70
>     frame pointer         = 0x28:0xffffff80002ecbc0
>     code segment          = base 0x0, limit 0xfffff, type 0x1b
>                           = DPL 0, pres 1, long 1, def32 0, gran 1
>     processor eflags      = interrupt enabled, resume, IOPL = 0
>     current process       = 13 (g_down)
>     trap number           = 12
>     panic: page fault
>     cpuid = 0
>     KDB: stack backtrace:
>     #0 0xffffffff808680fe at kdb_backtrace+0x5e
>     #1 0xffffffff80832cb7 at panic+0x184
>     #2 0xffffffff80b18400 at trap_fatal+0x290
>     #3 0xffffffff80b18749 at trap_pfault+0x1f9
>     #4 0xffffffff80b18c0f at trap+0x3df
>     #5 0xffffffff80b0313f at calltrap+0x8
>     #6 0xffffffff80g3f874 at g_io_schedule_down+0x1d4
>     #7 0xffffffff807cfb7c at g_down_procbody+0x5c
>     #8 0xffffffff8080682f at fork_exit+0x11f
>     #9 0xffffffff80b0366e at fork_trampoline+0xe
>     Uptime: 7m16s
>     Automatic reboot in 15 seconds - press a key on the console to abort

So the question is either "what am I doing wrong?" or "can anyone
confirm this is a bug?"

thanks in advance,
Niels


PS: I'm trying to post via email and read via nntp://gmane, I'm not sure
how well this works.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?504F4049.9080801>