Date: Sun, 21 Jun 2015 16:32:12 -0400 From: Quartz <quartz@sneakertech.com> To: Willem Jan Withagen <wjw@digiware.nl> Cc: freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS Message-ID: <55871F4C.5010103@sneakertech.com> In-Reply-To: <5586C396.9010100@digiware.nl> References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
> Or do I have to high hopes of ZFS? > And is a hung disk a 'catastrophic pool failure'? Yes to both. I encountered this exact same issue a couple years ago (and complained about it to this list as well, although I didn't get a complete answer at the time. I can provide links to the conversation if interested). Basically, the heart of the issue is the way the kernel/drivers/ZFS deals with IO and DMA. There's currently no way to tell what's going on with the disks and what outstanding IO to the pool can be dropped or ignored. As-currently-designed there's no safe way to just kick out the pool and keep going, so the only options are to wait, panic, or wait and then panic. Fixing this would require a major rewrite of a lot of code, which isn't going to happen any time soon. The failmode setting and deadman timer were implemented as a bandage to prevent the system from hanging forever. See this page for more info: http://comments.gmane.org/gmane.os.illumos.zfs/61 > All failmode settings result in a seriously handicapped system... Yes. Again, this is a design issue/flaw with how DMA works. There's no real way to continue on gracefully when a pool completely dies due to hung IO. We're all pretty much stuck with this problem, at least for quite a while. > Is waiting only meant to wait a limited time? And then panic anyways? By default yes. However, if you know that on your system the issue will eventually resolve itself given several hours (and you want to wait that long) you can change the deadman timeout or disable it completely. Look at "vfs.zfs.deadman_enabled" and "vfs.zfs.deadman_synctime".
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55871F4C.5010103>