Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Oct 2014 04:02:29 +0000
From:      Steve Wills <swills@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        fs@freebsd.org, current@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: zfs hang
Message-ID:  <20141010040228.GI79158@mouf.net>
In-Reply-To: <F93FC06BE5854556BF1F4318690C728C@multiplay.co.uk>
References:  <20141008004045.GA24762__48659.9047123038$1412728878$gmane$org@mouf.net> <5434D1CE.8010801@FreeBSD.org> <20141010012724.GD79158@mouf.net> <F93FC06BE5854556BF1F4318690C728C@multiplay.co.uk>

index | next in thread | previous in thread | raw e-mail

On Fri, Oct 10, 2014 at 02:35:14AM +0100, Steven Hartland wrote:
> 
> ----- Original Message ----- 
> From: "Steve Wills" <swills@freebsd.org>
> To: "Andriy Gapon" <avg@freebsd.org>
> Cc: <current@freebsd.org>; <fs@freebsd.org>
> Sent: Friday, October 10, 2014 2:27 AM
> Subject: Re: zfs hang
> 
> 
> > On Wed, Oct 08, 2014 at 08:55:26AM +0300, Andriy Gapon wrote:
> >> On 08/10/2014 03:40, Steve Wills wrote:
> >> > Hi,
> >> > 
> >> > Not sure which thread this belongs to, but I have a zfs hang on one of my boxes
> >> > running r272152. Running procstat -kka looks like:
> >> > 
> >> > http://pastebin.com/szZZP8Tf
> >> > 
> >> > My zpool commands seem to be hung in spa_errlog_lock while others are hung in
> >> > zfs_lookup. Suggestions?
> >> 
> >> There are several threads in zio_wait.  If this is their permanent state then
> >> there is some problem with I/O somewhere below ZFS.
> > 
> > Thanks for the feedback. It seems one of my disks is dying, I rebooted and it
> > came up OK, but today I got:
> > 
> >  panic: I/O to pool 'rpool' appears to be hung on vdev guid ..... at '/dev/ada0p3'
> > 
> > I have screenshots and backtrace if anyone is interested. Dying drives
> > shouldn't cause panic, right?
> 
> Its the deadman timer kicking in so yes, thats expected.
> 
> The following sysctls control this behaviour if you want to try and recover:
> vfs.zfs.deadman_synctime_ms: 1000000
> vfs.zfs.deadman_checktime_ms: 5000
> vfs.zfs.deadman_enabled: 1

Ah, ok. This pool has two disks, mirrored. I think one of them is dying, the
BIOS gives a SMART error on startup, but it still uses the disk fine. From what
I read of the zfs deadman design, it's for when the controller is acting up. So
I'm confused. Maybe this means both disks are dying?

Steve


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141010040228.GI79158>