From owner-freebsd-fs@FreeBSD.ORG Sat Jun 20 22:14:48 2015 Return-Path: Delivered-To: fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3919E6AC for ; Sat, 20 Jun 2015 22:14:48 +0000 (UTC) (envelope-from swills@mouf.net) Received: from mouf.net (mouf.net [IPv6:2607:fc50:0:4400:216:3eff:fe69:33b3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mouf.net", Issuer "mouf.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EC722280 for ; Sat, 20 Jun 2015 22:14:47 +0000 (UTC) (envelope-from swills@mouf.net) Received: from mouf.net (swills@mouf [199.48.129.64]) by mouf.net (8.14.9/8.14.9) with ESMTP id t5KMEXFd031117 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 20 Jun 2015 22:14:38 GMT (envelope-from swills@mouf.net) Received: (from swills@localhost) by mouf.net (8.14.9/8.14.9/Submit) id t5KMEWiS031116; Sat, 20 Jun 2015 22:14:32 GMT (envelope-from swills) Date: Sat, 20 Jun 2015 22:14:32 +0000 From: Steve Wills To: Willem Jan Withagen Cc: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS Message-ID: <20150620221431.GB26416@mouf.net> References: <5585767B.4000206@digiware.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5585767B.4000206@digiware.nl> User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mouf.net [199.48.129.64]); Sat, 20 Jun 2015 22:14:38 +0000 (UTC) X-Spam-Status: No, score=0.0 required=4.5 tests=HEADER_FROM_DIFFERENT_DOMAINS autolearn=unavailable autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mouf.net X-Virus-Scanned: clamav-milter 0.98.7 at mouf.net X-Virus-Status: Clean X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2015 22:14:48 -0000 On Sat, Jun 20, 2015 at 04:19:39PM +0200, Willem Jan Withagen wrote: > Hi, > > Found my system rebooted this morning: > > Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen > queue overflow: 8 already in queue awaiting acceptance (48 occurrences) > Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be > hung on vdev guid 18180224580327100979 at '/dev/da0'. > Jun 20 05:28:33 zfs kernel: cpuid = 0 > Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Which leads me to believe that /dev/da0 went out on vacation, leaving > ZFS into trouble.... But the array is: > ---- > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP > zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x > ONLINE - > raidz2 16.2T 6.67T 9.58T - 8% 41% > da0 - - - - - - > da1 - - - - - - > da2 - - - - - - > da3 - - - - - - > da4 - - - - - - > da5 - - - - - - > raidz2 16.2T 6.67T 9.58T - 7% 41% > da6 - - - - - - > da7 - - - - - - > ada4 - - - - - - > ada5 - - - - - - > ada6 - - - - - - > ada7 - - - - - - > mirror 504M 1.73M 502M - 39% 0% > gpt/log0 - - - - - - > gpt/log1 - - - - - - > cache - - - - - - > gpt/raidcache0 109G 1.34G 107G - 0% 1% > gpt/raidcache1 109G 787M 108G - 0% 0% > ---- > > And thus I'd would have expected that ZFS would disconnect /dev/da0 and > then switch to DEGRADED state and continue, letting the operator fix the > broken disk. > Instead it chooses to panic, which is not a nice thing to do. :) > > Or do I have to high hopes of ZFS? > > Next question to answer is why this WD RED on: > > arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 > rev=0x00 hdr=0x00 > vendor = 'Areca Technology Corp.' > device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' > class = mass storage > subclass = RAID > > got hung, and nothing for this shows in SMART.... > You may be hitting the zfs deadman panic, which is triggered when the controller hangs. This can in some cases be caused by disks that die in unusual ways. > > (If needed vmcore available) > The backtrace might confirm or dispute my theory. Steve