From owner-freebsd-fs@FreeBSD.ORG Sun Jun 21 19:50:18 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 691AA10A for ; Sun, 21 Jun 2015 19:50:18 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: from mail-oi0-x230.google.com (mail-oi0-x230.google.com [IPv6:2607:f8b0:4003:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2E9B097 for ; Sun, 21 Jun 2015 19:50:18 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: by oigx81 with SMTP id x81so109591111oig.1 for ; Sun, 21 Jun 2015 12:50:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=L8a9ADRzLN8Rsm2g877jh9MfaJbptDYU8BEsnPlKCmo=; b=flwiczJEzUQ3ryLHHdZayrlC1l4zhvkmUgM5tR6I9OE1a7FSP98eTYB+P55eVtcBa1 F8eigYnL2EsQ6dex2Vkik2tLcXUWjL31tOYHahb/fRU1j6qSgvb4OfYy9wXOTGiWzUYT O4vFdTPBIxxDq/ffWg+mxxN62j4A2fX5dfKrRCm4k+FbKIo4GsPg6xBoEGngkXmjNGJd Tc6qbkThjmZt7igPPCqvtelOM2Tg4cipooZ9/Osp3DBuKUPZpPh+WmCyiJpGtNl0iErL k7T8pt58E+Icl8nbQWAZ+I17JwYiIERsiV3wGKvfDbMEH9tGIwJtaZy1SHCEDWqyJEzI LJrQ== MIME-Version: 1.0 X-Received: by 10.60.155.132 with SMTP id vw4mr8044581oeb.51.1434916217248; Sun, 21 Jun 2015 12:50:17 -0700 (PDT) Received: by 10.202.77.138 with HTTP; Sun, 21 Jun 2015 12:50:17 -0700 (PDT) In-Reply-To: <5586C396.9010100@digiware.nl> References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> Date: Sun, 21 Jun 2015 15:50:17 -0400 Message-ID: Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS From: Tom Curry To: Willem Jan Withagen Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jun 2015 19:50:18 -0000 Was there by chance a lot of disk activity going on when this occurred? On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen wrote: > On 20/06/2015 18:11, Daryl Richards wrote: > > Check the failmode setting on your pool. From man zpool: > > > > failmode=wait | continue | panic > > > > Controls the system behavior in the event of catastrophic > > pool failure. This condition is typically a > > result of a loss of connectivity to the underlying storage > > device(s) or a failure of all devices within > > the pool. The behavior of such an event is determined as > > follows: > > > > wait Blocks all I/O access until the device > > connectivity is recovered and the errors are cleared. > > This is the default behavior. > > > > continue Returns EIO to any new write I/O requests but > > allows reads to any of the remaining healthy > > devices. Any write requests that have yet to be > > committed to disk would be blocked. > > > > panic Prints out a message to the console and generates > > a system crash dump. > > 'mmm > > Did not know about this setting. Nice one, but alas my current setting is: > zfsboot failmode wait default > zfsraid failmode wait default > > So either the setting is not working, or something else is up? > Is waiting only meant to wait a limited time? And then panic anyways? > > But then still I wonder why even in the 'continue'-case the ZFS system > ends in a state where the filesystem is not able to continue in its > standard functioning ( read and write ) and disconnects the disk??? > > All failmode settings result in a seriously handicapped system... > On a raidz2 system I would perhaps expected this to occur when the > second disk goes into thin space?? > > The other question is: The man page talks about > 'Controls the system behavior in the event of catastrophic pool failure' > And is a hung disk a 'catastrophic pool failure'? > > Still very puzzled? > > --WjW > > > > > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: > >> Hi, > >> > >> Found my system rebooted this morning: > >> > >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen > >> queue overflow: 8 already in queue awaiting acceptance (48 occurrences) > >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be > >> hung on vdev guid 18180224580327100979 at '/dev/da0'. > >> Jun 20 05:28:33 zfs kernel: cpuid = 0 > >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > >> > >> Which leads me to believe that /dev/da0 went out on vacation, leaving > >> ZFS into trouble.... But the array is: > >> ---- > >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP > >> zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x > >> ONLINE - > >> raidz2 16.2T 6.67T 9.58T - 8% 41% > >> da0 - - - - - - > >> da1 - - - - - - > >> da2 - - - - - - > >> da3 - - - - - - > >> da4 - - - - - - > >> da5 - - - - - - > >> raidz2 16.2T 6.67T 9.58T - 7% 41% > >> da6 - - - - - - > >> da7 - - - - - - > >> ada4 - - - - - - > >> ada5 - - - - - - > >> ada6 - - - - - - > >> ada7 - - - - - - > >> mirror 504M 1.73M 502M - 39% 0% > >> gpt/log0 - - - - - - > >> gpt/log1 - - - - - - > >> cache - - - - - - > >> gpt/raidcache0 109G 1.34G 107G - 0% 1% > >> gpt/raidcache1 109G 787M 108G - 0% 0% > >> ---- > >> > >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and > >> then switch to DEGRADED state and continue, letting the operator fix the > >> broken disk. > >> Instead it chooses to panic, which is not a nice thing to do. :) > >> > >> Or do I have to high hopes of ZFS? > >> > >> Next question to answer is why this WD RED on: > >> > >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 > >> rev=0x00 hdr=0x00 > >> vendor = 'Areca Technology Corp.' > >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' > >> class = mass storage > >> subclass = RAID > >> > >> got hung, and nothing for this shows in SMART.... > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >