From owner-freebsd-fs@FreeBSD.ORG Wed Jan 11 20:40:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4B28106566B for ; Wed, 11 Jan 2012 20:40:43 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id AB5778FC0C for ; Wed, 11 Jan 2012 20:40:43 +0000 (UTC) Received: from omta16.emeryville.ca.mail.comcast.net ([76.96.30.72]) by qmta03.emeryville.ca.mail.comcast.net with comcast id LJzw1i0031ZMdJ4A3Lgj0D; Wed, 11 Jan 2012 20:40:43 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.emeryville.ca.mail.comcast.net with comcast id LLgh1i01P1t3BNj8cLgiY4; Wed, 11 Jan 2012 20:40:42 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 8B371102C1E; Wed, 11 Jan 2012 12:40:41 -0800 (PST) Date: Wed, 11 Jan 2012 12:40:41 -0800 From: Jeremy Chadwick To: freebsd-fs@freebsd.org Message-ID: <20120111204041.GA47175@icarus.home.lan> References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120111210708.1168781e@fabiankeil.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 20:40:43 -0000 On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: > Gergely CZUCZY wrote: > > > I'd like to ask, whether it is normal behaviour when we're unplugging a > > disk under a ZFS system then on the first write a kernel panic happened. > > Sounds familiar. I currently have two PRs open for > reproducible kernel panics after a vdev gets lost: > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 > > Note that the pool layouts are different, though. Is this problem truly ZFS-specific? I'd been tracking this problem for years, and was told it was fixed: http://wiki.freebsd.org/BugBusting/Commonly_reported_issues * Panic occurs when a mounted device (USB, SATA, local image file, etc.) is removed Workaround: Be sure to umount all filesystems before removing the physical device Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 There is ongoing work to fully fix this problem, ETA 2009/02 OP, please provide a kernel backtrace. Otherwise, if needed, I can go yank one of the two mirrored disks out of my FreeBSD box at home to try and reproduce the problem. pool: data state: ONLINE scan: scrub repaired 0 in 1h17m with 0 errors on Thu Dec 29 12:05:05 2011 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 cache ada4 ONLINE 0 0 0 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 3.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: ATA-8 SATA 3.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ahci0: port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich3: at channel 3 on ahci0 ahcich3: [ITHREAD] > > The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008 > > fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are > > accessed over gmultipath, and the multipath'd devices are added to a > > ZFS mirror: > > DB > > mirror-0 > > multipath/DB01 > > multipath/DB02 > > mirror-1 > > multipath/DB03 > > multipath/DB04 > > logs > > mirror/host1p5 > > cache > > multipath/SSD03p1 > > spares > > multipath/DB05 > > > > System is 9.0-RELEASE > > > > I've unplugged DB03 and on the first write we got a kernel panic. > > Should this be normal behaviour or we're missing something here? > > Without a back trace or at least the panic reason one can only > speculate what's going on, but I think it's rather unlikely > that the panic is the intended behaviour and not a bug. > > Maybe you can gather some additional information and file a PR? > > > On a device removal we're expecting it to moving to the spare disk, or > > using the available redundant disks. > > I agree that this behaviour would be preferable to a panic. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |