From owner-freebsd-current@FreeBSD.ORG  Mon May 25 16:19:25 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9BB631065698
	for <freebsd-current@freebsd.org>; Mon, 25 May 2009 16:19:25 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from yx-out-2324.google.com (yx-out-2324.google.com [74.125.44.28])
	by mx1.freebsd.org (Postfix) with ESMTP id 4FB908FC5A
	for <freebsd-current@freebsd.org>; Mon, 25 May 2009 16:19:25 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: by yx-out-2324.google.com with SMTP id 8so1882868yxb.13
	for <freebsd-current@freebsd.org>; Mon, 25 May 2009 09:19:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=iqmDm9vF1vnVBjb45Lgw/c+JzusUr5ggaPO+MKMzKX4=;
	b=d02Mm01skTC8+BjitWUZmcjsZd60QYpkrcsASsKnqz0JCmYoH/Nd9hyyMSNp32rWMi
	kaw+M1mrRh/bkbz4sAnzaTUhSk31UWCOKKJPaE4XcPwu9sWeRF/R6Fwmvagtu9NkvwTJ
	Hzg1AlyoGR4UMXw8x3SrMuXus0gsgJE+VhZz8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=H/7PAbZbEKAM/4b+QqWaCQWR38cwZYJmRIRp00r0N1TB00NokxoZUcY0tSJGBLUC7a
	6ue4VMGN7kd/evOjogzax24MPGhvPLILHUunKFNl3dZAyVC3P8QkomO4paCIEDD+HWLh
	ADXTkgFxj+MBMXNPlhxBefa/IpDLYW5Ov/HWI=
MIME-Version: 1.0
Received: by 10.150.134.18 with SMTP id h18mr14485849ybd.317.1243268361412; 
	Mon, 25 May 2009 09:19:21 -0700 (PDT)
In-Reply-To: <D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org>
References: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org>
	<D98FEABB-8B8A-48E6-B021-B05816B4C699@exscape.org>
	<b269bc570905250839r54a0f58fo5474e9e219a222ca@mail.gmail.com>
	<D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org>
Date: Mon, 25 May 2009 09:19:21 -0700
Message-ID: <b269bc570905250919t5bf37b5cv6037f22eaf925154@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: freebsd-current@freebsd.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: ZFS panic under extreme circumstances (2/3 disks corrupted)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 May 2009 16:19:27 -0000

On Mon, May 25, 2009 at 9:12 AM, Thomas Backman <serenity@exscape.org> wrot=
e:
> On May 25, 2009, at 05:39 PM, Freddie Cash wrote:
>> On Mon, May 25, 2009 at 2:13 AM, Thomas Backman <serenity@exscape.org>
>> wrote:
>>> On May 24, 2009, at 09:02 PM, Thomas Backman wrote:
>>>
>>>> So, I was playing around with RAID-Z and self-healing...
>>>
>>> Yet another follow-up to this.
>>> It appears that all traces of errors vanish after a reboot. So, say you
>>> have a dying disk; ZFS repairs the data for you, and you don't notice (=
unless
>>> you check zpool status). Then you reboot, and there's NO (easy?) way th=
at I
>>> can tell to find out that something is wrong with your hardware!
>>
>> On our storage server that was initially configured using 1 large
>> 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go
>> south. =C2=A0"zpool status" was full of errors. =C2=A0And the error coun=
ts
>> survived reboots. =C2=A0Either that, or the drive was so bad that the er=
ror
>> counts started increasing right away after a boot. =C2=A0After a week of
>> fighting with it to get the new drive to resilver and get added to the
>> vdev, we nuked it and re-created it using 3 raidz2 vdevs each
>> comprised of 8 drives.
>>
>> (Un)fortunately, that was the only failure we've had so far, so can't
>> really confirm/deny the "error counts reset after reboot".
>
> Was this on FreeBSD?

64-bit FreeBSD 7.1 using ZFS v6.  SATA drives connected to 3Ware RAID
controllers, but configured as "Single Drive" arrays not using
hardware RAID in any way.

> I have another unfortunate thing to note regarding this: after a reboot,
> it's even impossible to tell *which disk* has gone bad, even if the pool =
is
> "uncleared" but otherwise "healed". It simply says that a device has fail=
ed,
> with no clue as to which one, since they're all "ONLINE"!

Even when using -v?  zpool status -v

--=20
Freddie Cash
fjwcash@gmail.com