From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 00:43:27 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CF390506 for ; Mon, 22 Jun 2015 00:43:27 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4727396B for ; Mon, 22 Jun 2015 00:11:02 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: by hub.freebsd.org (Postfix) id 2BA80272; Mon, 22 Jun 2015 00:11:02 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2876A270 for ; Mon, 22 Jun 2015 00:11:02 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 60DD61F6F for ; Mon, 22 Jun 2015 00:10:28 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 1E81516A409; Mon, 22 Jun 2015 01:30:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q1W7ZvfcA2rj; Mon, 22 Jun 2015 01:30:45 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69] (unknown [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69]) by smtp.digiware.nl (Postfix) with ESMTPA id 7CF3916A407; Mon, 22 Jun 2015 01:30:45 +0200 (CEST) Message-ID: <55874927.80807@digiware.nl> Date: Mon, 22 Jun 2015 01:30:47 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Quartz CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> In-Reply-To: <5587236A.6020404@sneakertech.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 00:43:27 -0000 On 21/06/2015 22:49, Quartz wrote: > Also: > >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and >> then switch to DEGRADED state and continue, letting the operator fix the >> broken disk. > >> Next question to answer is why this WD RED on: > >> got hung, and nothing for this shows in SMART.... > > You have a raidz2, which means THREE disks need to go down before the > pool is unwritable. The problem is most likely your controller or power > supply, not your disks. But still I would expect the volume to become degraded if one of the disks goes into the error state? It is real nice that it is still 'raidz1' but it does need to get fixed... > Also2: don't rely too much on SMART for determining drive health. Google > released a paper a few years ago revealing that half of all drives die > without reporting SMART errors. > > http://research.google.com/archive/disk_failures.pdf This article is mainly about forcasting disk failure based on SMART numbers.... Because first "failures" in SMART do nor require one to immediately replace the disk. The common idea is, if the numbers grow, expect the device to break. I was just looking at the counters to see if the disk had logged just any fact of info/warning/error that could have anything to do with the problem I have. Thanx, --WjW