From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 02:31:49 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE8A3545 for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D3D5215F for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: by hub.freebsd.org (Postfix) id C9969544; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C9027543 for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A693C15B for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id DD3E83F6E0; Sun, 21 Jun 2015 22:31:47 -0400 (EDT) Message-ID: <55877393.3040704@sneakertech.com> Date: Sun, 21 Jun 2015 22:31:47 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Michelle Sullivan CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> <558769B5.601@sorbs.net> In-Reply-To: <558769B5.601@sorbs.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 02:31:50 -0000 >> You have a raidz2, which means THREE disks need to go down before the >> pool is unwritable. The problem is most likely your controller or >> power supply, not your disks. >> > Never make such assumptions... > > I have worked in a professional environment where 9 of 12 disks failed > within 24 hours of each other.... Right... but if that was his problem there should be some logs of the other drives going down first, and typically ZFS would correctly mark the pool as degraded (at least, it would in my testing). The fact that ZFS didn't get a chance to log anything and the pool came back up healthy leads me to believe the controller went south, taking several disks with it all at once and totally borking all IO. (Either that or what Tom Curry mentioned about the Arc issue, which I wasn't previously aware of). Of course, if it issue isn't repeatable then who knows....