From owner-freebsd-fs@FreeBSD.ORG  Mon Nov  8 19:06:43 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A19011065673
	for <freebsd-fs@freebsd.org>; Mon,  8 Nov 2010 19:06:43 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta01.westchester.pa.mail.comcast.net
	(qmta01.westchester.pa.mail.comcast.net [76.96.62.16])
	by mx1.freebsd.org (Postfix) with ESMTP id 4C3DF8FC20
	for <freebsd-fs@freebsd.org>; Mon,  8 Nov 2010 19:06:42 +0000 (UTC)
Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98])
	by qmta01.westchester.pa.mail.comcast.net with comcast
	id UbpN1f00127AodY51j6jJx; Mon, 08 Nov 2010 19:06:43 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta19.westchester.pa.mail.comcast.net with comcast
	id Uj6i1f0043LrwQ23fj6ius; Mon, 08 Nov 2010 19:06:43 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id DA0469B427; Mon,  8 Nov 2010 11:06:40 -0800 (PST)
Date: Mon, 8 Nov 2010 11:06:40 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Mike Carlson <carlson39@llnl.gov>
Message-ID: <20101108190640.GA15661@icarus.home.lan>
References: <4CD84258.6090404@llnl.gov>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4CD84258.6090404@llnl.gov>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, pjd@freebsd.org
Subject: Re: 8.1-RELEASE: ZFS data errors
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Nov 2010 19:06:43 -0000

On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote:
> I'm having a problem with  stripping 7 18TB RAID6 (hardware SAN)
> volumes together.
> 
> Here is a quick rundown of the hardware:
> * HP DL180 G6 w/12GB ram
> * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter)
> * Winchester Hardware SAN,
> 
>    da2 at isp0 bus 0 scbus2 target 0 lun 0
>    da2: <WINSYS SX2318R 373O> Fixed Direct Access SCSI-5 device
>    da2: 800.000MB/s transfers
>    da2: Command Queueing enabled
>    da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C)
> 
> 
> As soon as I create the volume and write data to it, it is reported
> as being corrupted:
> 
>    write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8
>    write# zpool scrub filevol001dd if=/dev/random
>    of=/filevol001/random.dat.1 bs=1m count=1000
>    write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000
>    1000+0 records in
>    1000+0 records out
>    1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec)
>    write# cd /filevol001/
>    write# ls
>    random.dat.1
>    write# md5 *
>    MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3
>    write# cp random.dat.1 random.dat.2
>    cp: random.dat.1: Input/output error
>    write# zpool status
>       pool: filevol001
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         filevol001  ONLINE       0     0     0
>           da2       ONLINE       0     0     0
>           da3       ONLINE       0     0     0
>           da4       ONLINE       0     0     0
>           da5       ONLINE       0     0     0
>           da6       ONLINE       0     0     0
>           da7       ONLINE       0     0     0
>           da8       ONLINE       0     0     0
> 
>    errors: No known data errors
>    write# zpool scrub filevol001
>    write# zpool status
>       pool: filevol001
>      state: ONLINE
>    status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
>    action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>        see: http://www.sun.com/msg/ZFS-8000-8A
>      scrub: scrub completed after 0h0m with 2437 errors on Mon Nov  8
>    10:14:20 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         filevol001  ONLINE       0     0 2.38K
>           da2       ONLINE       0     0 1.24K  12K repaired
>           da3       ONLINE       0     0 1.12K
>           da4       ONLINE       0     0 1.13K
>           da5       ONLINE       0     0 1.27K
>           da6       ONLINE       0     0     0
>           da7       ONLINE       0     0     0
>           da8       ONLINE       0     0     0
> 
>    errors: 2437 data errors, use '-v' for a list
> 
> However, if I create a 'raidz' volume, no errors occur:
> 
>    write# zpool destroy filevol001
>    write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8
>    write# zpool status
>       pool: filevol001
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         filevol001  ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             da2     ONLINE       0     0     0
>             da3     ONLINE       0     0     0
>             da4     ONLINE       0     0     0
>             da5     ONLINE       0     0     0
>             da6     ONLINE       0     0     0
>             da7     ONLINE       0     0     0
>             da8     ONLINE       0     0     0
> 
>    errors: No known data errors
>    write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000
>    1000+0 records in
>    1000+0 records out
>    1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec)
>    write# zpool scrub filevol001
> 
>    dmesg output:
>    write# zpool status
>       pool: filevol001
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    09:54:51 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         filevol001  ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             da2     ONLINE       0     0     0
>             da3     ONLINE       0     0     0
>             da4     ONLINE       0     0     0
>             da5     ONLINE       0     0     0
>             da6     ONLINE       0     0     0
>             da7     ONLINE       0     0     0
>             da8     ONLINE       0     0     0
> 
>    errors: No known data errors
>    write# ls
>    random.dat.1
>    write# cp random.dat.1 random.dat.2
>    write# cp random.dat.1 random.dat.3
>    write# cp random.dat.1 random.dat.4
>    write# cp random.dat.1 random.dat.5
>    write# cp random.dat.1 random.dat.6
>    write# cp random.dat.1 random.dat.7
>    write# md5 *
>    MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2
>    MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2
>    MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2
>    MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2
>    MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2
>    MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2
>    MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2
> 
> What is also odd, is if I create 7 separate ZFS volumes, they do not
> report any data corruption:
> 
>    write# zpool destroy filevol001
>    write# zpool create test01 da2
>    write# zpool create test02 da3
>    write# zpool create test03 da4
>    write# zpool create test04 da5
>    write# zpool create test05 da6
>    write# zpool create test06 da7
>    write# zpool create test07 da8
>    write# zpool status
>       pool: test01
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test01      ONLINE       0     0     0
>           da2       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test02
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test02      ONLINE       0     0     0
>           da3       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test03
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test03      ONLINE       0     0     0
>           da4       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test04
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test04      ONLINE       0     0     0
>           da5       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test05
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test05      ONLINE       0     0     0
>           da6       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test06
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test06      ONLINE       0     0     0
>           da7       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test07
>      state: ONLINE
>      scrub: none requested
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test07      ONLINE       0     0     0
>           da8       ONLINE       0     0     0
> 
>    errors: No known data errors
>    write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000
>    1000+0 records in
>    1000+0 records out
>    1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec)
>    write# cd /tmp/
>    write# md5 /tmp/random.dat.1
>    MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp
>    random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1
>    /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07
>    write# md5 /test*/*
>    MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>    write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03
>    ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ;
>    zpool scrub test07
>    write# zpool status
>       pool: test01
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:27:49 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test01      ONLINE       0     0     0
>           da2       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test02
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:27:52 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test02      ONLINE       0     0     0
>           da3       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test03
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:27:54 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test03      ONLINE       0     0     0
>           da4       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test04
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:27:57 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test04      ONLINE       0     0     0
>           da5       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test05
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:28:00 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test05      ONLINE       0     0     0
>           da6       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test06
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:28:02 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test06      ONLINE       0     0     0
>           da7       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
>       pool: test07
>      state: ONLINE
>      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>    10:28:05 2010
>    config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         test07      ONLINE       0     0     0
>           da8       ONLINE       0     0     0
> 
>    errors: No known data errors
> 
> Based on these results, I've drawn the following conclusion:
> * ZFS single pool per device = OKAY
> * ZFS raidz of all devices = OKAY
> * ZFS stripe of all devices = NOT OKAY
> 
> The results are immediate, and I know ZFS will self-heal, so is that
> what it is doing behind my back and just not reporting it? Is this a
> ZFS bug with striping vs. raidz?

Can you reproduce this problem using RELENG_8?  Please try one of the
below snapshots.

ftp://ftp4.freebsd.org/pub/FreeBSD/snapshots/201011/

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |