From owner-freebsd-current@FreeBSD.ORG Tue Apr 7 13:33:18 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23A011065672 for ; Tue, 7 Apr 2009 13:33:18 +0000 (UTC) (envelope-from M.S.Powell@salford.ac.uk) Received: from relay0.salford.ac.uk (relay0.salford.ac.uk [146.87.0.10]) by mx1.freebsd.org (Postfix) with SMTP id A72FF8FC1F for ; Tue, 7 Apr 2009 13:33:17 +0000 (UTC) (envelope-from M.S.Powell@salford.ac.uk) Received: (qmail 36774 invoked by uid 98); 7 Apr 2009 13:33:16 -0000 Received: from 146.87.255.121 by relay0.salford.ac.uk (envelope-from , uid 401) with qmail-scanner-2.01 (clamdscan: 0.94.2/9210. spamassassin: 3.2.4. Clear:RC:1(146.87.255.121):. Processed in 0.092848 secs); 07 Apr 2009 13:33:16 -0000 Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121) by relay0.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP; Tue, 07 Apr 2009 14:33:16 +0100 Received: (qmail 30949 invoked by uid 1002); 7 Apr 2009 13:33:14 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 7 Apr 2009 13:33:14 -0000 Date: Tue, 7 Apr 2009 14:33:14 +0100 (BST) From: "Mark Powell" To: ticso@cicely.de In-Reply-To: <20090326084726.N87213@rust.salford.ac.uk> Message-ID: <20090407142423.L31650@rust.salford.ac.uk> References: <49BE4EC1.90207@163.com> <20090320102824.W75873@rust.salford.ac.uk> <20090320152737.D641@rust.salford.ac.uk> <20090325105613.55624rkkgf2xkr6s@webmail.leidinger.net> <20090325103721.G67233@rust.salford.ac.uk> <20090325135528.21416hzpozpjst8g@webmail.leidinger.net> <20090325125930.U73916@rust.salford.ac.uk> <20090325152128.2389990h7v6a02co@webmail.leidinger.net> <20090325152940.GB16409@cicely7.cicely.de> <20090325180054.L87213@rust.salford.ac.uk> <20090325183831.GD16409@cicely7.cicely.de> <20090326084726.N87213@rust.salford.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: dgerow@afflictions.org, Daniel Eriksson , FreeBSD Current , Mark Powell , kevin , Alexander Leidinger Subject: Re: Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 13:33:18 -0000 On Thu, 26 Mar 2009, Mark Powell wrote: > On Wed, 25 Mar 2009, Bernd Walter wrote: >> I don't know if it is with the drives, but other reasons are less >> likely in my opinion. >> The system is located in a data center and since I only get a few errors >> I decided to live with it and not to debug it further. > > I've decided to split my drives in two pools; 5x500GB RAIDZ1 of WD5000AAKS > and the 6x1TB RAIDZ2 of WD10EADS. I'll see if they perform differently. I'm > using the defaults of WC on, with all ZFS options enabled. Ok. I've been running with this config for 13 days now. During that time no CRC errors at all have been found on either pool. I have been scrubbing both pools together at 2am, hoping the simultaneous IO would cause some kind of hardware strain. There were again no CRC errors found in the scrub which occured at 2am today. However, after a few hours I see CRC errors appeared on both pools. Curiously CRC errors on both pools appeared at the same time. I've been running zpool status from cron every minute and all these new CRC errors, occured within two consecutive minutes: ----- # zpool status pool: pool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 0h11m, 6.16% done, 2h53m to go config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 stripe/str0 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 1 errors: No known data errors pool: pool2 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 0h11m, 2.82% done, 6h29m to go config: NAME STATE READ WRITE CKSUM pool2 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 ad18 ONLINE 0 0 0 ad20 ONLINE 0 0 4 ad22 ONLINE 0 0 2 ad24 ONLINE 0 0 0 ad26 ONLINE 0 0 0 ad28 ONLINE 0 0 6 errors: No known data errors ----- Is the opinion that this is still the drives? Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information & Learning Services, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 6843 Fax: +44 161 295 5888 www.pgp.com for PGP key