From owner-freebsd-current@FreeBSD.ORG Sat Jun 13 07:32:18 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 005E2106564A; Sat, 13 Jun 2009 07:32:17 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 82AC18FC1A; Sat, 13 Jun 2009 07:32:17 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:37804 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1MFNin-0007VP-4e; Sat, 13 Jun 2009 09:32:15 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id E5F4E6A447; Sat, 13 Jun 2009 09:32:10 +0200 (CEST) Message-Id: From: Thomas Backman To: Kip Macy In-Reply-To: <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Sat, 13 Jun 2009 09:32:09 +0200 References: <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1MFNin-0007VP-4e. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1MFNin-0007VP-4e 2ff15941a0ff8a5f42f1e0118a8b5be5 Cc: freebsd-fs@freebsd.org, FreeBSD Current Subject: Re: ZFS: Silent/hidden errors, nothing logged anywhere X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Jun 2009 07:32:18 -0000 On Jun 12, 2009, at 11:01 PM, Kip Macy wrote: > On Fri, Jun 12, 2009 at 10:32 AM, Thomas > Backman wrote: >> OK, so I filed a PR late May (kern/135050): >> http://www.freebsd.org/cgi/query-pr.cgi?pr=135050 . >> I don't know if this is a "feature" or a bug, but it really should be >> considered the latter. The data could be repaired in the background >> without >> the user ever knowing - until the disk dies completely. I'd prefer >> to have >> warning signs (i.e. checksum errors) so that I can buy a >> replacement drive >> *before* that. >> >> Not only does this mean that errors can go unnoticed, but also that >> it's >> impossible to figure out which disk is broken, if ZFS has >> *temporarily* >> repaired the broken data! THAT is REALLY bad! >> Is this something that we can expect to see changed before 8.0- >> RELEASE? > > > I'm fairly certain that we've discussed this already. Solaris uses FMA > - I don't think that I'll get to a "real fix" any time soon. The time > that I do have will go to addressing stability problems (memory > over-allocation, NFS interaction, control directory mounts) all of > which cause panics. Maintaining them persistently in the label doesn't > make sense - when do you drop them? Would a simple log message about > the number of checksum errors suffice? > > Cheers, > Kip Yes, I suppose a log message would be OK, especially if there's a semi- simple way of mailing root automatically (either by the ZFS libs themselves, or by a simple log analyzer daemon that I'm sure there are plenty of already). I do think that storing them in the label does make sense, though, but if Solaris doesn't do it, I suppose we shouldn't, either. IF stored that way, they should IMHO remain until a "zpool clear" is executed on device (a device that causes errors is a device that causes errors - most of the time, this is a great way for the disk to say "hey, I'm dying here!"). In practice, this clearing is already done on reboot (although the relevant functions are of course never called). Regards, Thomas