Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Nov 2010 11:32:04 -0800
From:      Mike Carlson <carlson39@llnl.gov>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, "pjd@freebsd.org" <pjd@freebsd.org>
Subject:   Re: 8.1-RELEASE: ZFS data errors
Message-ID:  <4CD85034.5000909@llnl.gov>
In-Reply-To: <20101108192950.GA15902@icarus.home.lan>
References:  <4CD84258.6090404@llnl.gov> <20101108190640.GA15661@icarus.home.lan> <4CD84B63.4030800@llnl.gov> <20101108192950.GA15902@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/08/2010 11:29 AM, Jeremy Chadwick wrote:
> On Mon, Nov 08, 2010 at 11:11:31AM -0800, Mike Carlson wrote:
>> On 11/08/2010 11:06 AM, Jeremy Chadwick wrote:
>>> On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote:
>>>> I'm having a problem with  stripping 7 18TB RAID6 (hardware SAN)
>>>> volumes together.
>>>>
>>>> Here is a quick rundown of the hardware:
>>>> * HP DL180 G6 w/12GB ram
>>>> * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter)
>>>> * Winchester Hardware SAN,
>>>>
>>>>     da2 at isp0 bus 0 scbus2 target 0 lun 0
>>>>     da2:<WINSYS SX2318R 373O>   Fixed Direct Access SCSI-5 device
>>>>     da2: 800.000MB/s transfers
>>>>     da2: Command Queueing enabled
>>>>     da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C)
>>>>
>>>>
>>>> As soon as I create the volume and write data to it, it is reported
>>>> as being corrupted:
>>>>
>>>>     write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8
>>>>     write# zpool scrub filevol001dd if=/dev/random
>>>>     of=/filevol001/random.dat.1 bs=1m count=1000
>>>>     write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000
>>>>     1000+0 records in
>>>>     1000+0 records out
>>>>     1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec)
>>>>     write# cd /filevol001/
>>>>     write# ls
>>>>     random.dat.1
>>>>     write# md5 *
>>>>     MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3
>>>>     write# cp random.dat.1 random.dat.2
>>>>     cp: random.dat.1: Input/output error
>>>>     write# zpool status
>>>>        pool: filevol001
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          filevol001  ONLINE       0     0     0
>>>>            da2       ONLINE       0     0     0
>>>>            da3       ONLINE       0     0     0
>>>>            da4       ONLINE       0     0     0
>>>>            da5       ONLINE       0     0     0
>>>>            da6       ONLINE       0     0     0
>>>>            da7       ONLINE       0     0     0
>>>>            da8       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>     write# zpool scrub filevol001
>>>>     write# zpool status
>>>>        pool: filevol001
>>>>       state: ONLINE
>>>>     status: One or more devices has experienced an error resulting in data
>>>>          corruption.  Applications may be affected.
>>>>     action: Restore the file in question if possible.  Otherwise restore the
>>>>          entire pool from backup.
>>>>         see: http://BLOCKEDBLOCKEDwww.BLOCKEDBLOCKEDsun.com/msg/ZFS-8000-8A
>>>>       scrub: scrub completed after 0h0m with 2437 errors on Mon Nov  8
>>>>     10:14:20 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          filevol001  ONLINE       0     0 2.38K
>>>>            da2       ONLINE       0     0 1.24K  12K repaired
>>>>            da3       ONLINE       0     0 1.12K
>>>>            da4       ONLINE       0     0 1.13K
>>>>            da5       ONLINE       0     0 1.27K
>>>>            da6       ONLINE       0     0     0
>>>>            da7       ONLINE       0     0     0
>>>>            da8       ONLINE       0     0     0
>>>>
>>>>     errors: 2437 data errors, use '-v' for a list
>>>>
>>>> However, if I create a 'raidz' volume, no errors occur:
>>>>
>>>>     write# zpool destroy filevol001
>>>>     write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8
>>>>     write# zpool status
>>>>        pool: filevol001
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          filevol001  ONLINE       0     0     0
>>>>            raidz1    ONLINE       0     0     0
>>>>              da2     ONLINE       0     0     0
>>>>              da3     ONLINE       0     0     0
>>>>              da4     ONLINE       0     0     0
>>>>              da5     ONLINE       0     0     0
>>>>              da6     ONLINE       0     0     0
>>>>              da7     ONLINE       0     0     0
>>>>              da8     ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>     write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000
>>>>     1000+0 records in
>>>>     1000+0 records out
>>>>     1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec)
>>>>     write# zpool scrub filevol001
>>>>
>>>>     dmesg output:
>>>>     write# zpool status
>>>>        pool: filevol001
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     09:54:51 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          filevol001  ONLINE       0     0     0
>>>>            raidz1    ONLINE       0     0     0
>>>>              da2     ONLINE       0     0     0
>>>>              da3     ONLINE       0     0     0
>>>>              da4     ONLINE       0     0     0
>>>>              da5     ONLINE       0     0     0
>>>>              da6     ONLINE       0     0     0
>>>>              da7     ONLINE       0     0     0
>>>>              da8     ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>     write# ls
>>>>     random.dat.1
>>>>     write# cp random.dat.1 random.dat.2
>>>>     write# cp random.dat.1 random.dat.3
>>>>     write# cp random.dat.1 random.dat.4
>>>>     write# cp random.dat.1 random.dat.5
>>>>     write# cp random.dat.1 random.dat.6
>>>>     write# cp random.dat.1 random.dat.7
>>>>     write# md5 *
>>>>     MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>     MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>     MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>     MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>     MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>     MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>     MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>>>
>>>> What is also odd, is if I create 7 separate ZFS volumes, they do not
>>>> report any data corruption:
>>>>
>>>>     write# zpool destroy filevol001
>>>>     write# zpool create test01 da2
>>>>     write# zpool create test02 da3
>>>>     write# zpool create test03 da4
>>>>     write# zpool create test04 da5
>>>>     write# zpool create test05 da6
>>>>     write# zpool create test06 da7
>>>>     write# zpool create test07 da8
>>>>     write# zpool status
>>>>        pool: test01
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test01      ONLINE       0     0     0
>>>>            da2       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test02
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test02      ONLINE       0     0     0
>>>>            da3       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test03
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test03      ONLINE       0     0     0
>>>>            da4       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test04
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test04      ONLINE       0     0     0
>>>>            da5       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test05
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test05      ONLINE       0     0     0
>>>>            da6       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test06
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test06      ONLINE       0     0     0
>>>>            da7       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test07
>>>>       state: ONLINE
>>>>       scrub: none requested
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test07      ONLINE       0     0     0
>>>>            da8       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>     write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000
>>>>     1000+0 records in
>>>>     1000+0 records out
>>>>     1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec)
>>>>     write# cd /tmp/
>>>>     write# md5 /tmp/random.dat.1
>>>>     MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp
>>>>     random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1
>>>>     /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07
>>>>     write# md5 /test*/*
>>>>     MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>>>     write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03
>>>>     ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ;
>>>>     zpool scrub test07
>>>>     write# zpool status
>>>>        pool: test01
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:27:49 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test01      ONLINE       0     0     0
>>>>            da2       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test02
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:27:52 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test02      ONLINE       0     0     0
>>>>            da3       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test03
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:27:54 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test03      ONLINE       0     0     0
>>>>            da4       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test04
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:27:57 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test04      ONLINE       0     0     0
>>>>            da5       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test05
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:28:00 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test05      ONLINE       0     0     0
>>>>            da6       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test06
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:28:02 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test06      ONLINE       0     0     0
>>>>            da7       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>>        pool: test07
>>>>       state: ONLINE
>>>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>>>     10:28:05 2010
>>>>     config:
>>>>
>>>>          NAME        STATE     READ WRITE CKSUM
>>>>          test07      ONLINE       0     0     0
>>>>            da8       ONLINE       0     0     0
>>>>
>>>>     errors: No known data errors
>>>>
>>>> Based on these results, I've drawn the following conclusion:
>>>> * ZFS single pool per device = OKAY
>>>> * ZFS raidz of all devices = OKAY
>>>> * ZFS stripe of all devices = NOT OKAY
>>>>
>>>> The results are immediate, and I know ZFS will self-heal, so is that
>>>> what it is doing behind my back and just not reporting it? Is this a
>>>> ZFS bug with striping vs. raidz?
>>> Can you reproduce this problem using RELENG_8?  Please try one of the
>>> below snapshots.
>>>
>>> ftp://BLOCKEDBLOCKEDftp4.freebsd.org/pub/FreeBSD/snapshots/201011/
>>>
>> The server is in a data center with limited access control, do I
>> have to option of using a particular CVS tag (checking out via csup)
>> and then perform a make world/kernel?
> Doing this is more painful than, say, downloading a livefs image and
> seeing if you can reproduce the problem (e.g. you won't be modifying
> your existing OS installation), especially since I can't guarantee that
> the problem you're seeing is fixed in RELENG_8 (hence my request to
> begin with).  But if you can't boot livefs, then here you go:
>
> You'll need some form of console access (either serial or VGA) to do the
> upgrade reliably.  "Rolling back" may also not be an option since
> RELENG_8 is newer than RELENG_8_1 and may have introduced some new
> binaries or executables into the fray.  If you don't have console access
> to this machine, if things go awry you may be SOL.  The vagueness of my
> statement is intentional; I can't cover every situation that might come
> to light.
>
> Please be sure to back up your kernel configuration file before doing
> the following, and make sure that the supfile shown below has
> tag=RELENG_8 in it (it should).  And yes, the rm commands below are
> recommended; failure to use them could result in some oddities given
> that your /usr/src tree refers to RELENG_8_1 version numbers which
> differ from RELENG_8.  You *do not* have to do this for ports (since for
> ports, tag=. is used by default).
>
> rm -fr /var/db/sup/src-all
> rm -fr /usr/src/*
> rm -fr /usr/obj/*
> csup -h cvsupserver -L 2 /usr/share/examples/cvsup/stable-supfile
>
> At this point you can restore your kernel configuration file to the
> appropriate place (/sys/i386/conf, /sys/amd64/conf, etc.) and build
> world/kernel as per the instructions in /usr/src/Makefile (see lines
> ~51-62).  ***Please do not skip any of the steps***.  Good luck.
>
> --
> | Jeremy Chadwick                                   jdc@parodius.com |
> | Parodius Networking                       http://BLOCKEDwww.BLOCKEDparodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
>
>
>
Ahh, point taken :) I think I'll take a trip to the datacenter and boot 
off of a thumb drive...

Thank Jeremy, I'll report back later!

Mike C



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CD85034.5000909>