From owner-freebsd-current@FreeBSD.ORG Mon May 25 16:19:25 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9BB631065698 for ; Mon, 25 May 2009 16:19:25 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from yx-out-2324.google.com (yx-out-2324.google.com [74.125.44.28]) by mx1.freebsd.org (Postfix) with ESMTP id 4FB908FC5A for ; Mon, 25 May 2009 16:19:25 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by yx-out-2324.google.com with SMTP id 8so1882868yxb.13 for ; Mon, 25 May 2009 09:19:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=iqmDm9vF1vnVBjb45Lgw/c+JzusUr5ggaPO+MKMzKX4=; b=d02Mm01skTC8+BjitWUZmcjsZd60QYpkrcsASsKnqz0JCmYoH/Nd9hyyMSNp32rWMi kaw+M1mrRh/bkbz4sAnzaTUhSk31UWCOKKJPaE4XcPwu9sWeRF/R6Fwmvagtu9NkvwTJ Hzg1AlyoGR4UMXw8x3SrMuXus0gsgJE+VhZz8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=H/7PAbZbEKAM/4b+QqWaCQWR38cwZYJmRIRp00r0N1TB00NokxoZUcY0tSJGBLUC7a 6ue4VMGN7kd/evOjogzax24MPGhvPLILHUunKFNl3dZAyVC3P8QkomO4paCIEDD+HWLh ADXTkgFxj+MBMXNPlhxBefa/IpDLYW5Ov/HWI= MIME-Version: 1.0 Received: by 10.150.134.18 with SMTP id h18mr14485849ybd.317.1243268361412; Mon, 25 May 2009 09:19:21 -0700 (PDT) In-Reply-To: References: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org> Date: Mon, 25 May 2009 09:19:21 -0700 Message-ID: From: Freddie Cash To: freebsd-current@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: ZFS panic under extreme circumstances (2/3 disks corrupted) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 May 2009 16:19:27 -0000 On Mon, May 25, 2009 at 9:12 AM, Thomas Backman wrot= e: > On May 25, 2009, at 05:39 PM, Freddie Cash wrote: >> On Mon, May 25, 2009 at 2:13 AM, Thomas Backman >> wrote: >>> On May 24, 2009, at 09:02 PM, Thomas Backman wrote: >>> >>>> So, I was playing around with RAID-Z and self-healing... >>> >>> Yet another follow-up to this. >>> It appears that all traces of errors vanish after a reboot. So, say you >>> have a dying disk; ZFS repairs the data for you, and you don't notice (= unless >>> you check zpool status). Then you reboot, and there's NO (easy?) way th= at I >>> can tell to find out that something is wrong with your hardware! >> >> On our storage server that was initially configured using 1 large >> 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go >> south. =C2=A0"zpool status" was full of errors. =C2=A0And the error coun= ts >> survived reboots. =C2=A0Either that, or the drive was so bad that the er= ror >> counts started increasing right away after a boot. =C2=A0After a week of >> fighting with it to get the new drive to resilver and get added to the >> vdev, we nuked it and re-created it using 3 raidz2 vdevs each >> comprised of 8 drives. >> >> (Un)fortunately, that was the only failure we've had so far, so can't >> really confirm/deny the "error counts reset after reboot". > > Was this on FreeBSD? 64-bit FreeBSD 7.1 using ZFS v6. SATA drives connected to 3Ware RAID controllers, but configured as "Single Drive" arrays not using hardware RAID in any way. > I have another unfortunate thing to note regarding this: after a reboot, > it's even impossible to tell *which disk* has gone bad, even if the pool = is > "uncleared" but otherwise "healed". It simply says that a device has fail= ed, > with no clue as to which one, since they're all "ONLINE"! Even when using -v? zpool status -v --=20 Freddie Cash fjwcash@gmail.com