From owner-freebsd-fs@FreeBSD.ORG Mon Nov 8 19:29:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52DE41065673 for ; Mon, 8 Nov 2010 19:29:53 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id F14618FC17 for ; Mon, 8 Nov 2010 19:29:52 +0000 (UTC) Received: from omta12.westchester.pa.mail.comcast.net ([76.96.62.44]) by qmta02.westchester.pa.mail.comcast.net with comcast id UcB91f0040xGWP852jVtav; Mon, 08 Nov 2010 19:29:53 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta12.westchester.pa.mail.comcast.net with comcast id UjVs1f0013LrwQ23YjVskQ; Mon, 08 Nov 2010 19:29:53 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id CD4379B427; Mon, 8 Nov 2010 11:29:50 -0800 (PST) Date: Mon, 8 Nov 2010 11:29:50 -0800 From: Jeremy Chadwick To: Mike Carlson Message-ID: <20101108192950.GA15902@icarus.home.lan> References: <4CD84258.6090404@llnl.gov> <20101108190640.GA15661@icarus.home.lan> <4CD84B63.4030800@llnl.gov> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CD84B63.4030800@llnl.gov> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-fs@freebsd.org" , "pjd@freebsd.org" Subject: Re: 8.1-RELEASE: ZFS data errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2010 19:29:55 -0000 On Mon, Nov 08, 2010 at 11:11:31AM -0800, Mike Carlson wrote: > On 11/08/2010 11:06 AM, Jeremy Chadwick wrote: > >On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote: > >>I'm having a problem with stripping 7 18TB RAID6 (hardware SAN) > >>volumes together. > >> > >>Here is a quick rundown of the hardware: > >>* HP DL180 G6 w/12GB ram > >>* QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter) > >>* Winchester Hardware SAN, > >> > >> da2 at isp0 bus 0 scbus2 target 0 lun 0 > >> da2: Fixed Direct Access SCSI-5 device > >> da2: 800.000MB/s transfers > >> da2: Command Queueing enabled > >> da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C) > >> > >> > >>As soon as I create the volume and write data to it, it is reported > >>as being corrupted: > >> > >> write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8 > >> write# zpool scrub filevol001dd if=/dev/random > >> of=/filevol001/random.dat.1 bs=1m count=1000 > >> write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 > >> 1000+0 records in > >> 1000+0 records out > >> 1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec) > >> write# cd /filevol001/ > >> write# ls > >> random.dat.1 > >> write# md5 * > >> MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3 > >> write# cp random.dat.1 random.dat.2 > >> cp: random.dat.1: Input/output error > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# zpool scrub filevol001 > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> status: One or more devices has experienced an error resulting in data > >> corruption. Applications may be affected. > >> action: Restore the file in question if possible. Otherwise restore the > >> entire pool from backup. > >> see: http://BLOCKEDwww.BLOCKEDsun.com/msg/ZFS-8000-8A > >> scrub: scrub completed after 0h0m with 2437 errors on Mon Nov 8 > >> 10:14:20 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 2.38K > >> da2 ONLINE 0 0 1.24K 12K repaired > >> da3 ONLINE 0 0 1.12K > >> da4 ONLINE 0 0 1.13K > >> da5 ONLINE 0 0 1.27K > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: 2437 data errors, use '-v' for a list > >> > >>However, if I create a 'raidz' volume, no errors occur: > >> > >> write# zpool destroy filevol001 > >> write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8 > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 0 > >> raidz1 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 > >> 1000+0 records in > >> 1000+0 records out > >> 1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec) > >> write# zpool scrub filevol001 > >> > >> dmesg output: > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 09:54:51 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 0 > >> raidz1 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# ls > >> random.dat.1 > >> write# cp random.dat.1 random.dat.2 > >> write# cp random.dat.1 random.dat.3 > >> write# cp random.dat.1 random.dat.4 > >> write# cp random.dat.1 random.dat.5 > >> write# cp random.dat.1 random.dat.6 > >> write# cp random.dat.1 random.dat.7 > >> write# md5 * > >> MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> > >>What is also odd, is if I create 7 separate ZFS volumes, they do not > >>report any data corruption: > >> > >> write# zpool destroy filevol001 > >> write# zpool create test01 da2 > >> write# zpool create test02 da3 > >> write# zpool create test03 da4 > >> write# zpool create test04 da5 > >> write# zpool create test05 da6 > >> write# zpool create test06 da7 > >> write# zpool create test07 da8 > >> write# zpool status > >> pool: test01 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test01 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test02 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test02 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test03 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test03 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test04 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test04 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test05 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test05 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test06 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test06 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test07 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test07 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000 > >> 1000+0 records in > >> 1000+0 records out > >> 1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec) > >> write# cd /tmp/ > >> write# md5 /tmp/random.dat.1 > >> MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp > >> random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1 > >> /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07 > >> write# md5 /test*/* > >> MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03 > >> ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ; > >> zpool scrub test07 > >> write# zpool status > >> pool: test01 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:49 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test01 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test02 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:52 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test02 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test03 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:54 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test03 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test04 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:57 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test04 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test05 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:28:00 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test05 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test06 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:28:02 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test06 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test07 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:28:05 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test07 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >>Based on these results, I've drawn the following conclusion: > >>* ZFS single pool per device = OKAY > >>* ZFS raidz of all devices = OKAY > >>* ZFS stripe of all devices = NOT OKAY > >> > >>The results are immediate, and I know ZFS will self-heal, so is that > >>what it is doing behind my back and just not reporting it? Is this a > >>ZFS bug with striping vs. raidz? > >Can you reproduce this problem using RELENG_8? Please try one of the > >below snapshots. > > > >ftp://BLOCKEDftp4.freebsd.org/pub/FreeBSD/snapshots/201011/ > > > The server is in a data center with limited access control, do I > have to option of using a particular CVS tag (checking out via csup) > and then perform a make world/kernel? Doing this is more painful than, say, downloading a livefs image and seeing if you can reproduce the problem (e.g. you won't be modifying your existing OS installation), especially since I can't guarantee that the problem you're seeing is fixed in RELENG_8 (hence my request to begin with). But if you can't boot livefs, then here you go: You'll need some form of console access (either serial or VGA) to do the upgrade reliably. "Rolling back" may also not be an option since RELENG_8 is newer than RELENG_8_1 and may have introduced some new binaries or executables into the fray. If you don't have console access to this machine, if things go awry you may be SOL. The vagueness of my statement is intentional; I can't cover every situation that might come to light. Please be sure to back up your kernel configuration file before doing the following, and make sure that the supfile shown below has tag=RELENG_8 in it (it should). And yes, the rm commands below are recommended; failure to use them could result in some oddities given that your /usr/src tree refers to RELENG_8_1 version numbers which differ from RELENG_8. You *do not* have to do this for ports (since for ports, tag=. is used by default). rm -fr /var/db/sup/src-all rm -fr /usr/src/* rm -fr /usr/obj/* csup -h cvsupserver -L 2 /usr/share/examples/cvsup/stable-supfile At this point you can restore your kernel configuration file to the appropriate place (/sys/i386/conf, /sys/amd64/conf, etc.) and build world/kernel as per the instructions in /usr/src/Makefile (see lines ~51-62). ***Please do not skip any of the steps***. Good luck. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |