Date: Mon, 08 Nov 2010 11:32:04 -0800 From: Mike Carlson <carlson39@llnl.gov> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, "pjd@freebsd.org" <pjd@freebsd.org> Subject: Re: 8.1-RELEASE: ZFS data errors Message-ID: <4CD85034.5000909@llnl.gov> In-Reply-To: <20101108192950.GA15902@icarus.home.lan> References: <4CD84258.6090404@llnl.gov> <20101108190640.GA15661@icarus.home.lan> <4CD84B63.4030800@llnl.gov> <20101108192950.GA15902@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/08/2010 11:29 AM, Jeremy Chadwick wrote: > On Mon, Nov 08, 2010 at 11:11:31AM -0800, Mike Carlson wrote: >> On 11/08/2010 11:06 AM, Jeremy Chadwick wrote: >>> On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote: >>>> I'm having a problem with stripping 7 18TB RAID6 (hardware SAN) >>>> volumes together. >>>> >>>> Here is a quick rundown of the hardware: >>>> * HP DL180 G6 w/12GB ram >>>> * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter) >>>> * Winchester Hardware SAN, >>>> >>>> da2 at isp0 bus 0 scbus2 target 0 lun 0 >>>> da2:<WINSYS SX2318R 373O> Fixed Direct Access SCSI-5 device >>>> da2: 800.000MB/s transfers >>>> da2: Command Queueing enabled >>>> da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C) >>>> >>>> >>>> As soon as I create the volume and write data to it, it is reported >>>> as being corrupted: >>>> >>>> write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8 >>>> write# zpool scrub filevol001dd if=/dev/random >>>> of=/filevol001/random.dat.1 bs=1m count=1000 >>>> write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 >>>> 1000+0 records in >>>> 1000+0 records out >>>> 1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec) >>>> write# cd /filevol001/ >>>> write# ls >>>> random.dat.1 >>>> write# md5 * >>>> MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3 >>>> write# cp random.dat.1 random.dat.2 >>>> cp: random.dat.1: Input/output error >>>> write# zpool status >>>> pool: filevol001 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> filevol001 ONLINE 0 0 0 >>>> da2 ONLINE 0 0 0 >>>> da3 ONLINE 0 0 0 >>>> da4 ONLINE 0 0 0 >>>> da5 ONLINE 0 0 0 >>>> da6 ONLINE 0 0 0 >>>> da7 ONLINE 0 0 0 >>>> da8 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> write# zpool scrub filevol001 >>>> write# zpool status >>>> pool: filevol001 >>>> state: ONLINE >>>> status: One or more devices has experienced an error resulting in data >>>> corruption. Applications may be affected. >>>> action: Restore the file in question if possible. Otherwise restore the >>>> entire pool from backup. >>>> see: http://BLOCKEDBLOCKEDwww.BLOCKEDBLOCKEDsun.com/msg/ZFS-8000-8A >>>> scrub: scrub completed after 0h0m with 2437 errors on Mon Nov 8 >>>> 10:14:20 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> filevol001 ONLINE 0 0 2.38K >>>> da2 ONLINE 0 0 1.24K 12K repaired >>>> da3 ONLINE 0 0 1.12K >>>> da4 ONLINE 0 0 1.13K >>>> da5 ONLINE 0 0 1.27K >>>> da6 ONLINE 0 0 0 >>>> da7 ONLINE 0 0 0 >>>> da8 ONLINE 0 0 0 >>>> >>>> errors: 2437 data errors, use '-v' for a list >>>> >>>> However, if I create a 'raidz' volume, no errors occur: >>>> >>>> write# zpool destroy filevol001 >>>> write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8 >>>> write# zpool status >>>> pool: filevol001 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> filevol001 ONLINE 0 0 0 >>>> raidz1 ONLINE 0 0 0 >>>> da2 ONLINE 0 0 0 >>>> da3 ONLINE 0 0 0 >>>> da4 ONLINE 0 0 0 >>>> da5 ONLINE 0 0 0 >>>> da6 ONLINE 0 0 0 >>>> da7 ONLINE 0 0 0 >>>> da8 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 >>>> 1000+0 records in >>>> 1000+0 records out >>>> 1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec) >>>> write# zpool scrub filevol001 >>>> >>>> dmesg output: >>>> write# zpool status >>>> pool: filevol001 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 09:54:51 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> filevol001 ONLINE 0 0 0 >>>> raidz1 ONLINE 0 0 0 >>>> da2 ONLINE 0 0 0 >>>> da3 ONLINE 0 0 0 >>>> da4 ONLINE 0 0 0 >>>> da5 ONLINE 0 0 0 >>>> da6 ONLINE 0 0 0 >>>> da7 ONLINE 0 0 0 >>>> da8 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> write# ls >>>> random.dat.1 >>>> write# cp random.dat.1 random.dat.2 >>>> write# cp random.dat.1 random.dat.3 >>>> write# cp random.dat.1 random.dat.4 >>>> write# cp random.dat.1 random.dat.5 >>>> write# cp random.dat.1 random.dat.6 >>>> write# cp random.dat.1 random.dat.7 >>>> write# md5 * >>>> MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2 >>>> >>>> What is also odd, is if I create 7 separate ZFS volumes, they do not >>>> report any data corruption: >>>> >>>> write# zpool destroy filevol001 >>>> write# zpool create test01 da2 >>>> write# zpool create test02 da3 >>>> write# zpool create test03 da4 >>>> write# zpool create test04 da5 >>>> write# zpool create test05 da6 >>>> write# zpool create test06 da7 >>>> write# zpool create test07 da8 >>>> write# zpool status >>>> pool: test01 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test01 ONLINE 0 0 0 >>>> da2 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test02 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test02 ONLINE 0 0 0 >>>> da3 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test03 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test03 ONLINE 0 0 0 >>>> da4 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test04 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test04 ONLINE 0 0 0 >>>> da5 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test05 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test05 ONLINE 0 0 0 >>>> da6 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test06 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test06 ONLINE 0 0 0 >>>> da7 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test07 >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test07 ONLINE 0 0 0 >>>> da8 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000 >>>> 1000+0 records in >>>> 1000+0 records out >>>> 1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec) >>>> write# cd /tmp/ >>>> write# md5 /tmp/random.dat.1 >>>> MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp >>>> random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1 >>>> /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07 >>>> write# md5 /test*/* >>>> MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 >>>> write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03 >>>> ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ; >>>> zpool scrub test07 >>>> write# zpool status >>>> pool: test01 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:27:49 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test01 ONLINE 0 0 0 >>>> da2 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test02 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:27:52 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test02 ONLINE 0 0 0 >>>> da3 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test03 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:27:54 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test03 ONLINE 0 0 0 >>>> da4 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test04 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:27:57 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test04 ONLINE 0 0 0 >>>> da5 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test05 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:28:00 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test05 ONLINE 0 0 0 >>>> da6 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test06 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:28:02 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test06 ONLINE 0 0 0 >>>> da7 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: test07 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 >>>> 10:28:05 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> test07 ONLINE 0 0 0 >>>> da8 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> Based on these results, I've drawn the following conclusion: >>>> * ZFS single pool per device = OKAY >>>> * ZFS raidz of all devices = OKAY >>>> * ZFS stripe of all devices = NOT OKAY >>>> >>>> The results are immediate, and I know ZFS will self-heal, so is that >>>> what it is doing behind my back and just not reporting it? Is this a >>>> ZFS bug with striping vs. raidz? >>> Can you reproduce this problem using RELENG_8? Please try one of the >>> below snapshots. >>> >>> ftp://BLOCKEDBLOCKEDftp4.freebsd.org/pub/FreeBSD/snapshots/201011/ >>> >> The server is in a data center with limited access control, do I >> have to option of using a particular CVS tag (checking out via csup) >> and then perform a make world/kernel? > Doing this is more painful than, say, downloading a livefs image and > seeing if you can reproduce the problem (e.g. you won't be modifying > your existing OS installation), especially since I can't guarantee that > the problem you're seeing is fixed in RELENG_8 (hence my request to > begin with). But if you can't boot livefs, then here you go: > > You'll need some form of console access (either serial or VGA) to do the > upgrade reliably. "Rolling back" may also not be an option since > RELENG_8 is newer than RELENG_8_1 and may have introduced some new > binaries or executables into the fray. If you don't have console access > to this machine, if things go awry you may be SOL. The vagueness of my > statement is intentional; I can't cover every situation that might come > to light. > > Please be sure to back up your kernel configuration file before doing > the following, and make sure that the supfile shown below has > tag=RELENG_8 in it (it should). And yes, the rm commands below are > recommended; failure to use them could result in some oddities given > that your /usr/src tree refers to RELENG_8_1 version numbers which > differ from RELENG_8. You *do not* have to do this for ports (since for > ports, tag=. is used by default). > > rm -fr /var/db/sup/src-all > rm -fr /usr/src/* > rm -fr /usr/obj/* > csup -h cvsupserver -L 2 /usr/share/examples/cvsup/stable-supfile > > At this point you can restore your kernel configuration file to the > appropriate place (/sys/i386/conf, /sys/amd64/conf, etc.) and build > world/kernel as per the instructions in /usr/src/Makefile (see lines > ~51-62). ***Please do not skip any of the steps***. Good luck. > > -- > | Jeremy Chadwick jdc@parodius.com | > | Parodius Networking http://BLOCKEDwww.BLOCKEDparodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > Ahh, point taken :) I think I'll take a trip to the datacenter and boot off of a thumb drive... Thank Jeremy, I'll report back later! Mike C
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CD85034.5000909>