Date: Mon, 8 Nov 2010 11:29:50 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Mike Carlson <carlson39@llnl.gov> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, "pjd@freebsd.org" <pjd@freebsd.org> Subject: Re: 8.1-RELEASE: ZFS data errors Message-ID: <20101108192950.GA15902@icarus.home.lan> In-Reply-To: <4CD84B63.4030800@llnl.gov> References: <4CD84258.6090404@llnl.gov> <20101108190640.GA15661@icarus.home.lan> <4CD84B63.4030800@llnl.gov>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 08, 2010 at 11:11:31AM -0800, Mike Carlson wrote: > On 11/08/2010 11:06 AM, Jeremy Chadwick wrote: > >On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote: > >>I'm having a problem with stripping 7 18TB RAID6 (hardware SAN) > >>volumes together. > >> > >>Here is a quick rundown of the hardware: > >>* HP DL180 G6 w/12GB ram > >>* QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter) > >>* Winchester Hardware SAN, > >> > >> da2 at isp0 bus 0 scbus2 target 0 lun 0 > >> da2:<WINSYS SX2318R 373O> Fixed Direct Access SCSI-5 device > >> da2: 800.000MB/s transfers > >> da2: Command Queueing enabled > >> da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C) > >> > >> > >>As soon as I create the volume and write data to it, it is reported > >>as being corrupted: > >> > >> write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8 > >> write# zpool scrub filevol001dd if=/dev/random > >> of=/filevol001/random.dat.1 bs=1m count=1000 > >> write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 > >> 1000+0 records in > >> 1000+0 records out > >> 1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec) > >> write# cd /filevol001/ > >> write# ls > >> random.dat.1 > >> write# md5 * > >> MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3 > >> write# cp random.dat.1 random.dat.2 > >> cp: random.dat.1: Input/output error > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# zpool scrub filevol001 > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> status: One or more devices has experienced an error resulting in data > >> corruption. Applications may be affected. > >> action: Restore the file in question if possible. Otherwise restore the > >> entire pool from backup. > >> see: http://BLOCKEDwww.BLOCKEDsun.com/msg/ZFS-8000-8A > >> scrub: scrub completed after 0h0m with 2437 errors on Mon Nov 8 > >> 10:14:20 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 2.38K > >> da2 ONLINE 0 0 1.24K 12K repaired > >> da3 ONLINE 0 0 1.12K > >> da4 ONLINE 0 0 1.13K > >> da5 ONLINE 0 0 1.27K > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: 2437 data errors, use '-v' for a list > >> > >>However, if I create a 'raidz' volume, no errors occur: > >> > >> write# zpool destroy filevol001 > >> write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8 > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 0 > >> raidz1 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 > >> 1000+0 records in > >> 1000+0 records out > >> 1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec) > >> write# zpool scrub filevol001 > >> > >> dmesg output: > >> write# zpool status > >> pool: filevol001 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 09:54:51 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> filevol001 ONLINE 0 0 0 > >> raidz1 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# ls > >> random.dat.1 > >> write# cp random.dat.1 random.dat.2 > >> write# cp random.dat.1 random.dat.3 > >> write# cp random.dat.1 random.dat.4 > >> write# cp random.dat.1 random.dat.5 > >> write# cp random.dat.1 random.dat.6 > >> write# cp random.dat.1 random.dat.7 > >> write# md5 * > >> MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2 > >> > >>What is also odd, is if I create 7 separate ZFS volumes, they do not > >>report any data corruption: > >> > >> write# zpool destroy filevol001 > >> write# zpool create test01 da2 > >> write# zpool create test02 da3 > >> write# zpool create test03 da4 > >> write# zpool create test04 da5 > >> write# zpool create test05 da6 > >> write# zpool create test06 da7 > >> write# zpool create test07 da8 > >> write# zpool status > >> pool: test01 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test01 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test02 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test02 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test03 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test03 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test04 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test04 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test05 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test05 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test06 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test06 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test07 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test07 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000 > >> 1000+0 records in > >> 1000+0 records out > >> 1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec) > >> write# cd /tmp/ > >> write# md5 /tmp/random.dat.1 > >> MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp > >> random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1 > >> /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07 > >> write# md5 /test*/* > >> MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > >> write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03 > >> ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ; > >> zpool scrub test07 > >> write# zpool status > >> pool: test01 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:49 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test01 ONLINE 0 0 0 > >> da2 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test02 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:52 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test02 ONLINE 0 0 0 > >> da3 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test03 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:54 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test03 ONLINE 0 0 0 > >> da4 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test04 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:27:57 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test04 ONLINE 0 0 0 > >> da5 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test05 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:28:00 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test05 ONLINE 0 0 0 > >> da6 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test06 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:28:02 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test06 ONLINE 0 0 0 > >> da7 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >> pool: test07 > >> state: ONLINE > >> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > >> 10:28:05 2010 > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> test07 ONLINE 0 0 0 > >> da8 ONLINE 0 0 0 > >> > >> errors: No known data errors > >> > >>Based on these results, I've drawn the following conclusion: > >>* ZFS single pool per device = OKAY > >>* ZFS raidz of all devices = OKAY > >>* ZFS stripe of all devices = NOT OKAY > >> > >>The results are immediate, and I know ZFS will self-heal, so is that > >>what it is doing behind my back and just not reporting it? Is this a > >>ZFS bug with striping vs. raidz? > >Can you reproduce this problem using RELENG_8? Please try one of the > >below snapshots. > > > >ftp://BLOCKEDftp4.freebsd.org/pub/FreeBSD/snapshots/201011/ > > > The server is in a data center with limited access control, do I > have to option of using a particular CVS tag (checking out via csup) > and then perform a make world/kernel? Doing this is more painful than, say, downloading a livefs image and seeing if you can reproduce the problem (e.g. you won't be modifying your existing OS installation), especially since I can't guarantee that the problem you're seeing is fixed in RELENG_8 (hence my request to begin with). But if you can't boot livefs, then here you go: You'll need some form of console access (either serial or VGA) to do the upgrade reliably. "Rolling back" may also not be an option since RELENG_8 is newer than RELENG_8_1 and may have introduced some new binaries or executables into the fray. If you don't have console access to this machine, if things go awry you may be SOL. The vagueness of my statement is intentional; I can't cover every situation that might come to light. Please be sure to back up your kernel configuration file before doing the following, and make sure that the supfile shown below has tag=RELENG_8 in it (it should). And yes, the rm commands below are recommended; failure to use them could result in some oddities given that your /usr/src tree refers to RELENG_8_1 version numbers which differ from RELENG_8. You *do not* have to do this for ports (since for ports, tag=. is used by default). rm -fr /var/db/sup/src-all rm -fr /usr/src/* rm -fr /usr/obj/* csup -h cvsupserver -L 2 /usr/share/examples/cvsup/stable-supfile At this point you can restore your kernel configuration file to the appropriate place (/sys/i386/conf, /sys/amd64/conf, etc.) and build world/kernel as per the instructions in /usr/src/Makefile (see lines ~51-62). ***Please do not skip any of the steps***. Good luck. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101108192950.GA15902>