Date: Sat, 20 Jun 2015 22:14:32 +0000 From: Steve Wills <swills@FreeBSD.org> To: Willem Jan Withagen <wjw@digiware.nl> Cc: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS Message-ID: <20150620221431.GB26416@mouf.net> In-Reply-To: <5585767B.4000206@digiware.nl> References: <5585767B.4000206@digiware.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jun 20, 2015 at 04:19:39PM +0200, Willem Jan Withagen wrote: > Hi, > > Found my system rebooted this morning: > > Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen > queue overflow: 8 already in queue awaiting acceptance (48 occurrences) > Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be > hung on vdev guid 18180224580327100979 at '/dev/da0'. > Jun 20 05:28:33 zfs kernel: cpuid = 0 > Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Which leads me to believe that /dev/da0 went out on vacation, leaving > ZFS into trouble.... But the array is: > ---- > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP > zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x > ONLINE - > raidz2 16.2T 6.67T 9.58T - 8% 41% > da0 - - - - - - > da1 - - - - - - > da2 - - - - - - > da3 - - - - - - > da4 - - - - - - > da5 - - - - - - > raidz2 16.2T 6.67T 9.58T - 7% 41% > da6 - - - - - - > da7 - - - - - - > ada4 - - - - - - > ada5 - - - - - - > ada6 - - - - - - > ada7 - - - - - - > mirror 504M 1.73M 502M - 39% 0% > gpt/log0 - - - - - - > gpt/log1 - - - - - - > cache - - - - - - > gpt/raidcache0 109G 1.34G 107G - 0% 1% > gpt/raidcache1 109G 787M 108G - 0% 0% > ---- > > And thus I'd would have expected that ZFS would disconnect /dev/da0 and > then switch to DEGRADED state and continue, letting the operator fix the > broken disk. > Instead it chooses to panic, which is not a nice thing to do. :) > > Or do I have to high hopes of ZFS? > > Next question to answer is why this WD RED on: > > arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 > rev=0x00 hdr=0x00 > vendor = 'Areca Technology Corp.' > device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' > class = mass storage > subclass = RAID > > got hung, and nothing for this shows in SMART.... > You may be hitting the zfs deadman panic, which is triggered when the controller hangs. This can in some cases be caused by disks that die in unusual ways. > > (If needed vmcore available) > The backtrace might confirm or dispute my theory. Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150620221431.GB26416>