From owner-freebsd-current@FreeBSD.ORG Wed Mar 25 21:12:49 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A80721065673 for ; Wed, 25 Mar 2009 21:12:49 +0000 (UTC) (envelope-from shuvaev@physik.uni-wuerzburg.de) Received: from mailrelay.rz.uni-wuerzburg.de (mailrelay.rz.uni-wuerzburg.de [132.187.3.28]) by mx1.freebsd.org (Postfix) with ESMTP id 2A4288FC2D for ; Wed, 25 Mar 2009 21:12:49 +0000 (UTC) (envelope-from shuvaev@physik.uni-wuerzburg.de) Received: from virusscan.mail (localhost [127.0.0.1]) by mailrelay.mail (Postfix) with ESMTP id 6CE60A07A6; Wed, 25 Mar 2009 22:12:48 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by virusscan.mail (Postfix) with ESMTP id 6040BA07A4; Wed, 25 Mar 2009 22:12:48 +0100 (CET) Received: from mail.physik.uni-wuerzburg.de (wthp192.physik.uni-wuerzburg.de [132.187.40.192]) by mailmaster.uni-wuerzburg.de (Postfix) with ESMTP id 42C12A07A0; Wed, 25 Mar 2009 22:12:48 +0100 (CET) Received: from wep4035 ([132.187.37.35]) by mail.physik.uni-wuerzburg.de (Lotus Domino Release 8.0.2HF443) with ESMTP id 2009032522124781-32549 ; Wed, 25 Mar 2009 22:12:47 +0100 Received: by wep4035 (sSMTP sendmail emulation); Wed, 25 Mar 2009 22:12:47 +0100 Date: Wed, 25 Mar 2009 22:12:47 +0100 From: Alexey Shuvaev To: "army.of.root" Message-ID: <20090325211247.GA4659@wep4035.physik.uni-wuerzburg.de> References: <20090325105613.55624rkkgf2xkr6s@webmail.leidinger.net> <20090325103721.G67233@rust.salford.ac.uk> <20090325135528.21416hzpozpjst8g@webmail.leidinger.net> <20090325125930.U73916@rust.salford.ac.uk> <20090325152128.2389990h7v6a02co@webmail.leidinger.net> <20090325152940.GB16409@cicely7.cicely.de> <20090325180054.L87213@rust.salford.ac.uk> <20090325183831.GD16409@cicely7.cicely.de> <20090325203558.GA4533@wep4035.physik.uni-wuerzburg.de> <49CA9823.9050605@googlemail.com> MIME-Version: 1.0 In-Reply-To: <49CA9823.9050605@googlemail.com> Organization: Universitaet Wuerzburg User-Agent: Mutt/1.5.18 (2008-05-17) X-MIMETrack: Itemize by SMTP Server on domino1/uni-wuerzburg(Release 8.0.2HF443 | November 25, 2008) at 03/25/2009 10:12:47 PM, Serialize by Router on domino1/uni-wuerzburg(Release 8.0.2HF443 | November 25, 2008) at 03/25/2009 10:12:48 PM, Serialize complete at 03/25/2009 10:12:48 PM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: by amavisd-new at uni-wuerzburg.de Cc: FreeBSD Current Subject: Re: Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Mar 2009 21:12:50 -0000 On Wed, Mar 25, 2009 at 09:46:27PM +0100, army.of.root wrote: > Alexey Shuvaev wrote: >> On Wed, Mar 25, 2009 at 07:38:32PM +0100, Bernd Walter wrote: >>> On Wed, Mar 25, 2009 at 06:04:08PM +0000, Mark Powell wrote: >>>> On Wed, 25 Mar 2009, Bernd Walter wrote: >>>> >>>>> On Wed, Mar 25, 2009 at 03:21:28PM +0100, Alexander Leidinger wrote: >>>>> I wouldn't be surprised if the problem is in the drive firmware. >>>>> Preread and wc both have the potential to put a lot load to the drives >>>>> and can trigger bugs that otherwise wouldn't matter. >>>> I've emailed WD support for more info. Not expecting much though. >>>> From reading other threads on these Green Power drives them seem >>>> rather crap. This is my model and firmware: >>>> >>>> http://www.datacent.com/datarecovery/hdd/western_digital/WD10EADS-00L5B1 >>>> >>>> There's some head park problem too, but with 5s ZFS sync I don't >>>> think it applies in this case: >>>> >>>> http://www.silentpcreview.com/forums/viewtopic.php?t=51401&postdays=0&postorder=asc&start=120&sid=a1caf68d80ef8fecc5d9e86defde4c19 >>>> http://kerneltrap.org/mailarchive/linux-kernel/2008/4/9/1386304 >>>> >>>>> I also have a system running WD drives and ECC RAM which show CRC errors >>>> >from time to time, while all other systems have no CRC problem at all. >>>> >>>> Interesting. Are those CRC problems with WC on or off? >>> WC is on, prefetch is off, but only because it had bad performance with >>> MySQL. >>> Drives are Serial ATA II >>> I don't know if it is with the drives, but other reasons are less >>> likely in my opinion. >>> The system is located in a data center and since I only get a few errors >>> I decided to live with it and not to debug it further. >>> >> Hello! >> >> Me too... >> >> I don't use zfs, just ufs2 + soft updates, but I see sometimes rather >> heavy data corruption (most often on / filesystem). >> No kernel messages, I can shut down the system successfully just >> to find the remnants of filesystems on the next boot. >> It doesn't happen often, I think compiling ports in a jail + some >> activity in the host increase the probability of a failure. >> >> The drive is: >> ATA channel 3: >> Master: ad6 SATA revision 2.x >> >> hw.ata.wc=1 (default) >> >> FWIW, >> Alexey. > > Hi :) > > Damn f**k ! - I just bought WD harddrives for my Workstation... > > is there any way to detect silent data corruption without ZFS ? > Not I'm aware of... I vaguely remember some Lock Order Reversal appearing before the system goes to the hell (something with kn_list???) but it may be unrelated. Anyway if you see it, it is too late... The first symptom is some applications are crashing with signal 11 and absolutely trashed backtraces. One time I have tried to break into debugger in such a state and do immediate reboot but it didn't help, disk seems to be already synced at that time... Sometimes the system survives for a few weeks, sometimes only for a few days. So, backup, backup, backup... Just my 0.02$, Alexey.