Date: Sun, 21 Jun 2015 16:00:54 +0200 From: Willem Jan Withagen <wjw@digiware.nl> To: Daryl Richards <daryl@isletech.net>, freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS Message-ID: <5586C396.9010100@digiware.nl> In-Reply-To: <558590BD.40603@isletech.net> References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20/06/2015 18:11, Daryl Richards wrote: > Check the failmode setting on your pool. From man zpool: > > failmode=wait | continue | panic > > Controls the system behavior in the event of catastrophic > pool failure. This condition is typically a > result of a loss of connectivity to the underlying storage > device(s) or a failure of all devices within > the pool. The behavior of such an event is determined as > follows: > > wait Blocks all I/O access until the device > connectivity is recovered and the errors are cleared. > This is the default behavior. > > continue Returns EIO to any new write I/O requests but > allows reads to any of the remaining healthy > devices. Any write requests that have yet to be > committed to disk would be blocked. > > panic Prints out a message to the console and generates > a system crash dump. 'mmm Did not know about this setting. Nice one, but alas my current setting is: zfsboot failmode wait default zfsraid failmode wait default So either the setting is not working, or something else is up? Is waiting only meant to wait a limited time? And then panic anyways? But then still I wonder why even in the 'continue'-case the ZFS system ends in a state where the filesystem is not able to continue in its standard functioning ( read and write ) and disconnects the disk??? All failmode settings result in a seriously handicapped system... On a raidz2 system I would perhaps expected this to occur when the second disk goes into thin space?? The other question is: The man page talks about 'Controls the system behavior in the event of catastrophic pool failure' And is a hung disk a 'catastrophic pool failure'? Still very puzzled? --WjW > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: >> Hi, >> >> Found my system rebooted this morning: >> >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen >> queue overflow: 8 already in queue awaiting acceptance (48 occurrences) >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be >> hung on vdev guid 18180224580327100979 at '/dev/da0'. >> Jun 20 05:28:33 zfs kernel: cpuid = 0 >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >> >> Which leads me to believe that /dev/da0 went out on vacation, leaving >> ZFS into trouble.... But the array is: >> ---- >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP >> zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x >> ONLINE - >> raidz2 16.2T 6.67T 9.58T - 8% 41% >> da0 - - - - - - >> da1 - - - - - - >> da2 - - - - - - >> da3 - - - - - - >> da4 - - - - - - >> da5 - - - - - - >> raidz2 16.2T 6.67T 9.58T - 7% 41% >> da6 - - - - - - >> da7 - - - - - - >> ada4 - - - - - - >> ada5 - - - - - - >> ada6 - - - - - - >> ada7 - - - - - - >> mirror 504M 1.73M 502M - 39% 0% >> gpt/log0 - - - - - - >> gpt/log1 - - - - - - >> cache - - - - - - >> gpt/raidcache0 109G 1.34G 107G - 0% 1% >> gpt/raidcache1 109G 787M 108G - 0% 0% >> ---- >> >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and >> then switch to DEGRADED state and continue, letting the operator fix the >> broken disk. >> Instead it chooses to panic, which is not a nice thing to do. :) >> >> Or do I have to high hopes of ZFS? >> >> Next question to answer is why this WD RED on: >> >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 >> rev=0x00 hdr=0x00 >> vendor = 'Areca Technology Corp.' >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' >> class = mass storage >> subclass = RAID >> >> got hung, and nothing for this shows in SMART....
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5586C396.9010100>