From owner-freebsd-current@FreeBSD.ORG Wed Mar 25 13:49:11 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF9BB1065709 for ; Wed, 25 Mar 2009 13:49:11 +0000 (UTC) (envelope-from M.S.Powell@salford.ac.uk) Received: from airy.salford.ac.uk (airy.salford.ac.uk [146.87.0.11]) by mx1.freebsd.org (Postfix) with SMTP id 1A8D38FC18 for ; Wed, 25 Mar 2009 13:49:10 +0000 (UTC) (envelope-from M.S.Powell@salford.ac.uk) Received: (qmail 99031 invoked by uid 98); 25 Mar 2009 13:49:10 +0000 Received: from 146.87.255.121 by airy.salford.ac.uk (envelope-from , uid 401) with qmail-scanner-2.01 (clamdscan: 0.94.2/9164. spamassassin: 3.2.4. Clear:RC:1(146.87.255.121):. Processed in 0.043189 secs); 25 Mar 2009 13:49:10 -0000 Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121) by airy.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP; Wed, 25 Mar 2009 13:49:09 +0000 Received: (qmail 81535 invoked by uid 1002); 25 Mar 2009 13:49:07 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 25 Mar 2009 13:49:07 -0000 Date: Wed, 25 Mar 2009 13:49:07 +0000 (GMT) From: "Mark Powell" To: Alexander Leidinger In-Reply-To: <20090325135528.21416hzpozpjst8g@webmail.leidinger.net> Message-ID: <20090325125930.U73916@rust.salford.ac.uk> References: <49BD117B.2080706@163.com> <4F9C9299A10AE74E89EA580D14AA10A635E68A@royal64.emp.zapto.org> <49BE4EC1.90207@163.com> <20090320102824.W75873@rust.salford.ac.uk> <20090320152737.D641@rust.salford.ac.uk> <20090325105613.55624rkkgf2xkr6s@webmail.leidinger.net> <20090325103721.G67233@rust.salford.ac.uk> <20090325135528.21416hzpozpjst8g@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: kevin , FreeBSD Current , Daniel Eriksson Subject: Re: Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Mar 2009 13:49:13 -0000 On Wed, 25 Mar 2009, Alexander Leidinger wrote: >> Can prefetch really cause these problems? And if so why? > > I don't think so. I missed the part where you explained this before. In > this case it's really the write cache. The interesting questions is if > this is because of the harddisks you use, or because of a bug in the > software. > > You run a very recent current? 1-2 weeks before there was a bug (not in > ZFS) which caused CRC errors, but it was fixed shortly after it was > noticed. If you haven't updated your system, it may be best to update it > and try again. Please report back. I'm running recent current. I too saw that there were bugs causing CRC errors, and hoped that the relevant fixes would help me out. Unfortunately not. I most recently remade the whole array again with current from last Thursday 19th March. I tried it with WC disabled, but performance is awful. I expected, obviously a little worse, but not to be noticable without benchmarks? Well restoring my 1st LTO2 200GB tape (should take 1h45-2hrs), after 3h30 it was only about halfway through the tape, so I gave up. Hoping, possibly in vain, that it was a ZFS option causing the issue. The drives in question are: ad24 Device Model: WDC WD10EADS-00L5B1 ad22 Device Model: WDC WD10EADS-00L5B1 ad20 Device Model: WDC WD10EADS-00L5B1 ad18 Device Model: WDC WD10EADS-00L5B1 ad16 Device Model: WDC WD10EADS-00L5B1 ad14 Device Model: WDC WD10EADS-00L5B1 ad10 Device Model: WDC WD5000AAKS-22TMA0 ad8 Device Model: WDC WD5000AAKS-65TMA0 The WD5000AAKS were used for around 18 months in the previous 9x500GB RAIDZ2 on 7, so I would expect them to be ok. I've had the WD10EADS for about 2 months. However, I did replace the drives in the old 9x500GB RAIDZ2, with each of the new drives to check they were ok, resilvering them, one at a time, into the array i.e. eventually I was running 3x500GB+6X1TB in the still logically 9x500GB RAIDZ2. Yes, this would only check the lower 500GB of each 1TB drive, but surely that's enough of a test? AFAICT, I had WC off in 7 though. On my most recent failure I do see: ----- # zpool status -v pool: pool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress, 40.02% done, 3h54m to go config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 42 raidz2 ONLINE 0 0 42 stripe/str0 ONLINE 0 0 0 ad14 ONLINE 0 0 4 ad16 ONLINE 0 0 2 ad18 ONLINE 0 0 3 ad20 ONLINE 0 0 7 ad22 ONLINE 0 0 4 ad24 ONLINE 0 0 5 ----- i.e. no errors on the 2x500GB stripe. That would seem to suggest firmware write caching bugs on the 1TB drives. However, my other error report had: ----- pool: pool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h51m with 0 errors on Fri Mar 20 10:57:18 2009 config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz2 ONLINE 0 0 23 stripe/str0 ONLINE 0 0 489 12.3M repaired ad14 ONLINE 0 0 786 19.7M repaired ad16 ONLINE 0 0 804 20.1M repaired ad18 ONLINE 0 0 754 18.8M repaired ad20 ONLINE 0 0 771 19.3M repaired ad22 ONLINE 0 0 808 20.2M repaired ad24 ONLINE 0 0 848 21.2M repaired errors: No known data errors ----- i.e. errors on the stripe, but the stripe error count seems to be just over half of that a 1TB drive. If the errors we spread evenly, one would expect 2x the amount of CRC errors on the stripe? >>> If you want to get more out of zfs, maybe vfs.zfs.vdev.max_pending could >>> help if you are using SATA (as I read the zfs tuning guide, it makes sense >>> to have a high value when you have command queueing, which we have with >>> SCSI drives, but not yet with SATA drives and probably not at all with >>> PATA drives). >> >> I'm running completely SATA with NCQ supporting drives. However, and >> possibly as you say, NCQ is not really/properly supported in FBSD? > > NCQ is not supported yet in FreeBSD. Alexander Motin said he is interested in > implementing it, but I don't know about the status of this. Ok. So vfs.zfs.vdev.max_pending is irrelevant for SATA currently? Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information & Learning Services, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 6843 Fax: +44 161 295 5888 www.pgp.com for PGP key