Date: Wed, 11 Jan 2012 12:40:41 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: freebsd-fs@freebsd.org Subject: Re: Unplugging disk under ZFS yield panic Message-ID: <20120111204041.GA47175@icarus.home.lan> In-Reply-To: <20120111210708.1168781e@fabiankeil.de> References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: > Gergely CZUCZY <phoemix@harmless.hu> wrote: > > > I'd like to ask, whether it is normal behaviour when we're unplugging a > > disk under a ZFS system then on the first write a kernel panic happened. > > Sounds familiar. I currently have two PRs open for > reproducible kernel panics after a vdev gets lost: > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 > > Note that the pool layouts are different, though. Is this problem truly ZFS-specific? I'd been tracking this problem for years, and was told it was fixed: http://wiki.freebsd.org/BugBusting/Commonly_reported_issues * Panic occurs when a mounted device (USB, SATA, local image file, etc.) is removed Workaround: Be sure to umount all filesystems before removing the physical device Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 There is ongoing work to fully fix this problem, ETA 2009/02 OP, please provide a kernel backtrace. Otherwise, if needed, I can go yank one of the two mirrored disks out of my FreeBSD box at home to try and reproduce the problem. pool: data state: ONLINE scan: scrub repaired 0 in 1h17m with 0 errors on Thu Dec 29 12:05:05 2011 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 cache ada4 ONLINE 0 0 0 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ahci0: <Intel ICH9 AHCI SATA controller> port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich3: [ITHREAD] > > The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008 > > fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are > > accessed over gmultipath, and the multipath'd devices are added to a > > ZFS mirror: > > DB > > mirror-0 > > multipath/DB01 > > multipath/DB02 > > mirror-1 > > multipath/DB03 > > multipath/DB04 > > logs > > mirror/host1p5 > > cache > > multipath/SSD03p1 > > spares > > multipath/DB05 > > > > System is 9.0-RELEASE > > > > I've unplugged DB03 and on the first write we got a kernel panic. > > Should this be normal behaviour or we're missing something here? > > Without a back trace or at least the panic reason one can only > speculate what's going on, but I think it's rather unlikely > that the panic is the intended behaviour and not a bug. > > Maybe you can gather some additional information and file a PR? > > > On a device removal we're expecting it to moving to the spare disk, or > > using the available redundant disks. > > I agree that this behaviour would be preferable to a panic. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120111204041.GA47175>