From owner-freebsd-fs@FreeBSD.ORG Fri Oct 10 04:02:37 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5957891C; Fri, 10 Oct 2014 04:02:37 +0000 (UTC) Received: from mouf.net (mouf.net [IPv6:2607:fc50:0:4400:216:3eff:fe69:33b3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mouf.net", Issuer "mouf.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F2BE0B0B; Fri, 10 Oct 2014 04:02:36 +0000 (UTC) Received: from mouf.net (swills@mouf [199.48.129.64]) by mouf.net (8.14.5/8.14.5) with ESMTP id s9A42TmG080514 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 10 Oct 2014 04:02:34 GMT (envelope-from swills@mouf.net) Received: (from swills@localhost) by mouf.net (8.14.5/8.14.5/Submit) id s9A42TIP080513; Fri, 10 Oct 2014 04:02:29 GMT (envelope-from swills) Date: Fri, 10 Oct 2014 04:02:29 +0000 From: Steve Wills To: Steven Hartland Subject: Re: zfs hang Message-ID: <20141010040228.GI79158@mouf.net> References: <20141008004045.GA24762__48659.9047123038$1412728878$gmane$org@mouf.net> <5434D1CE.8010801@FreeBSD.org> <20141010012724.GD79158@mouf.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mouf.net [199.48.129.64]); Fri, 10 Oct 2014 04:02:34 +0000 (UTC) X-Spam-Status: No, score=0.0 required=4.5 tests=HEADER_FROM_DIFFERENT_DOMAINS autolearn=unavailable autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mouf.net X-Virus-Scanned: clamav-milter 0.98.3 at mouf.net X-Virus-Status: Clean Cc: fs@freebsd.org, current@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Oct 2014 04:02:37 -0000 On Fri, Oct 10, 2014 at 02:35:14AM +0100, Steven Hartland wrote: > > ----- Original Message ----- > From: "Steve Wills" > To: "Andriy Gapon" > Cc: ; > Sent: Friday, October 10, 2014 2:27 AM > Subject: Re: zfs hang > > > > On Wed, Oct 08, 2014 at 08:55:26AM +0300, Andriy Gapon wrote: > >> On 08/10/2014 03:40, Steve Wills wrote: > >> > Hi, > >> > > >> > Not sure which thread this belongs to, but I have a zfs hang on one of my boxes > >> > running r272152. Running procstat -kka looks like: > >> > > >> > http://pastebin.com/szZZP8Tf > >> > > >> > My zpool commands seem to be hung in spa_errlog_lock while others are hung in > >> > zfs_lookup. Suggestions? > >> > >> There are several threads in zio_wait. If this is their permanent state then > >> there is some problem with I/O somewhere below ZFS. > > > > Thanks for the feedback. It seems one of my disks is dying, I rebooted and it > > came up OK, but today I got: > > > > panic: I/O to pool 'rpool' appears to be hung on vdev guid ..... at '/dev/ada0p3' > > > > I have screenshots and backtrace if anyone is interested. Dying drives > > shouldn't cause panic, right? > > Its the deadman timer kicking in so yes, thats expected. > > The following sysctls control this behaviour if you want to try and recover: > vfs.zfs.deadman_synctime_ms: 1000000 > vfs.zfs.deadman_checktime_ms: 5000 > vfs.zfs.deadman_enabled: 1 Ah, ok. This pool has two disks, mirrored. I think one of them is dying, the BIOS gives a SMART error on startup, but it still uses the disk fine. From what I read of the zfs deadman design, it's for when the controller is acting up. So I'm confused. Maybe this means both disks are dying? Steve