Date: Tue, 7 Apr 2009 14:33:14 +0100 (BST) From: "Mark Powell" <M.S.Powell@salford.ac.uk> To: ticso@cicely.de Cc: dgerow@afflictions.org, Daniel Eriksson <daniel@toomuchdata.com>, FreeBSD Current <freebsd-current@freebsd.org>, Mark Powell <M.S.Powell@salford.ac.uk>, kevin <kevinxlinuz@163.com>, Alexander Leidinger <Alexander@Leidinger.net> Subject: Re: Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons) Message-ID: <20090407142423.L31650@rust.salford.ac.uk> In-Reply-To: <20090326084726.N87213@rust.salford.ac.uk> References: <49BE4EC1.90207@163.com> <20090320102824.W75873@rust.salford.ac.uk> <20090320152737.D641@rust.salford.ac.uk> <20090325105613.55624rkkgf2xkr6s@webmail.leidinger.net> <20090325103721.G67233@rust.salford.ac.uk> <20090325135528.21416hzpozpjst8g@webmail.leidinger.net> <20090325125930.U73916@rust.salford.ac.uk> <20090325152128.2389990h7v6a02co@webmail.leidinger.net> <20090325152940.GB16409@cicely7.cicely.de> <20090325180054.L87213@rust.salford.ac.uk> <20090325183831.GD16409@cicely7.cicely.de> <20090326084726.N87213@rust.salford.ac.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 26 Mar 2009, Mark Powell wrote: > On Wed, 25 Mar 2009, Bernd Walter wrote: >> I don't know if it is with the drives, but other reasons are less >> likely in my opinion. >> The system is located in a data center and since I only get a few errors >> I decided to live with it and not to debug it further. > > I've decided to split my drives in two pools; 5x500GB RAIDZ1 of WD5000AAKS > and the 6x1TB RAIDZ2 of WD10EADS. I'll see if they perform differently. I'm > using the defaults of WC on, with all ZFS options enabled. Ok. I've been running with this config for 13 days now. During that time no CRC errors at all have been found on either pool. I have been scrubbing both pools together at 2am, hoping the simultaneous IO would cause some kind of hardware strain. There were again no CRC errors found in the scrub which occured at 2am today. However, after a few hours I see CRC errors appeared on both pools. Curiously CRC errors on both pools appeared at the same time. I've been running zpool status from cron every minute and all these new CRC errors, occured within two consecutive minutes: ----- # zpool status pool: pool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 0h11m, 6.16% done, 2h53m to go config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 stripe/str0 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 1 errors: No known data errors pool: pool2 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 0h11m, 2.82% done, 6h29m to go config: NAME STATE READ WRITE CKSUM pool2 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 ad18 ONLINE 0 0 0 ad20 ONLINE 0 0 4 ad22 ONLINE 0 0 2 ad24 ONLINE 0 0 0 ad26 ONLINE 0 0 0 ad28 ONLINE 0 0 6 errors: No known data errors ----- Is the opinion that this is still the drives? Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information & Learning Services, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 6843 Fax: +44 161 295 5888 www.pgp.com for PGP key
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090407142423.L31650>