From owner-freebsd-current@FreeBSD.ORG Mon May 25 16:12:58 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D6281065677 for ; Mon, 25 May 2009 16:12:58 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 1F9FC8FC1B for ; Mon, 25 May 2009 16:12:58 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:44901 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1M8cmW-00083z-4y; Mon, 25 May 2009 18:12:10 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 8FD7CEECCD; Mon, 25 May 2009 18:12:05 +0200 (CEST) Message-Id: From: Thomas Backman To: Freddie Cash In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Mon, 25 May 2009 18:12:05 +0200 References: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org> X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1M8cmW-00083z-4y. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1M8cmW-00083z-4y 712022565f8b1d9e5f17bdeb8d67e263 Cc: freebsd-current@freebsd.org Subject: Re: ZFS panic under extreme circumstances (2/3 disks corrupted) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 May 2009 16:12:59 -0000 On May 25, 2009, at 05:39 PM, Freddie Cash wrote: > On Mon, May 25, 2009 at 2:13 AM, Thomas Backman > wrote: >> On May 24, 2009, at 09:02 PM, Thomas Backman wrote: >> >>> So, I was playing around with RAID-Z and self-healing... >> >> Yet another follow-up to this. >> It appears that all traces of errors vanish after a reboot. So, say >> you have >> a dying disk; ZFS repairs the data for you, and you don't notice >> (unless you >> check zpool status). Then you reboot, and there's NO (easy?) way >> that I can >> tell to find out that something is wrong with your hardware! > > On our storage server that was initially configured using 1 large > 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go > south. "zpool status" was full of errors. And the error counts > survived reboots. Either that, or the drive was so bad that the error > counts started increasing right away after a boot. After a week of > fighting with it to get the new drive to resilver and get added to the > vdev, we nuked it and re-created it using 3 raidz2 vdevs each > comprised of 8 drives. > > (Un)fortunately, that was the only failure we've had so far, so can't > really confirm/deny the "error counts reset after reboot". Was this on FreeBSD? I have another unfortunate thing to note regarding this: after a reboot, it's even impossible to tell *which disk* has gone bad, even if the pool is "uncleared" but otherwise "healed". It simply says that a device has failed, with no clue as to which one, since they're all "ONLINE"! Regards, Thomas