From owner-freebsd-fs@FreeBSD.ORG  Wed Jan 11 20:40:43 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C4B28106566B
	for <freebsd-fs@freebsd.org>; Wed, 11 Jan 2012 20:40:43 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta03.emeryville.ca.mail.comcast.net
	(qmta03.emeryville.ca.mail.comcast.net [76.96.30.32])
	by mx1.freebsd.org (Postfix) with ESMTP id AB5778FC0C
	for <freebsd-fs@freebsd.org>; Wed, 11 Jan 2012 20:40:43 +0000 (UTC)
Received: from omta16.emeryville.ca.mail.comcast.net ([76.96.30.72])
	by qmta03.emeryville.ca.mail.comcast.net with comcast
	id LJzw1i0031ZMdJ4A3Lgj0D; Wed, 11 Jan 2012 20:40:43 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta16.emeryville.ca.mail.comcast.net with comcast
	id LLgh1i01P1t3BNj8cLgiY4; Wed, 11 Jan 2012 20:40:42 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 8B371102C1E; Wed, 11 Jan 2012 12:40:41 -0800 (PST)
Date: Wed, 11 Jan 2012 12:40:41 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: freebsd-fs@freebsd.org
Message-ID: <20120111204041.GA47175@icarus.home.lan>
References: <20120111154722.000036e4@unknown>
	<20120111210708.1168781e@fabiankeil.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120111210708.1168781e@fabiankeil.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: 
Subject: Re: Unplugging disk under ZFS yield panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Jan 2012 20:40:43 -0000

On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote:
> Gergely CZUCZY <phoemix@harmless.hu> wrote:
> 
> > I'd like to ask, whether it is normal behaviour when we're unplugging a
> > disk under a ZFS system then on the first write a kernel panic happened.
> 
> Sounds familiar. I currently have two PRs open for
> reproducible kernel panics after a vdev gets lost:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036
> 
> Note that the pool layouts are different, though.

Is this problem truly ZFS-specific?  I'd been tracking this problem for
years, and was told it was fixed:

http://wiki.freebsd.org/BugBusting/Commonly_reported_issues

* Panic occurs when a mounted device (USB, SATA, local image file,
  etc.) is removed

  Workaround: Be sure to umount all filesystems before removing the
  physical device
  Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21

  There is ongoing work to fully fix this problem, ETA 2009/02 

OP, please provide a kernel backtrace.

Otherwise, if needed, I can go yank one of the two mirrored disks out of
my FreeBSD box at home to try and reproduce the problem.

  pool: data
 state: ONLINE
 scan: scrub repaired 0 in 1h17m with 0 errors on Thu Dec 29 12:05:05 2011
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
        cache
          ada4      ONLINE       0     0     0


ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)

ahci0: <Intel ICH9 AHCI SATA controller> port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich3: [ITHREAD]

> > The hardware is a supermicro X8DTH-i/6/iF/6F board with 2x LSI 2008
> > fusion MPT SAS-2 controllers, over the mps(4) driver. The disks are
> > accessed over gmultipath, and the multipath'd devices are added to a
> > ZFS mirror:
> > DB
> >  mirror-0
> >   multipath/DB01
> >   multipath/DB02
> >  mirror-1
> >   multipath/DB03
> >   multipath/DB04
> >  logs
> >   mirror/host1p5
> >  cache
> >   multipath/SSD03p1
> >  spares
> >   multipath/DB05
> > 
> > System is 9.0-RELEASE
> > 
> > I've unplugged DB03 and on the first write we got a kernel panic.
> > Should this be normal behaviour or we're missing something here?
> 
> Without a back trace or at least the panic reason one can only
> speculate what's going on, but I think it's rather unlikely
> that the panic is the intended behaviour and not a bug.
> 
> Maybe you can gather some additional information and file a PR?
> 
> > On a device removal we're expecting it to moving to the spare disk, or
> > using the available redundant disks.
> 
> I agree that this behaviour would be preferable to a panic.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |