From owner-freebsd-fs@FreeBSD.ORG Mon Nov 8 19:06:43 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A19011065673 for ; Mon, 8 Nov 2010 19:06:43 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id 4C3DF8FC20 for ; Mon, 8 Nov 2010 19:06:42 +0000 (UTC) Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98]) by qmta01.westchester.pa.mail.comcast.net with comcast id UbpN1f00127AodY51j6jJx; Mon, 08 Nov 2010 19:06:43 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta19.westchester.pa.mail.comcast.net with comcast id Uj6i1f0043LrwQ23fj6ius; Mon, 08 Nov 2010 19:06:43 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id DA0469B427; Mon, 8 Nov 2010 11:06:40 -0800 (PST) Date: Mon, 8 Nov 2010 11:06:40 -0800 From: Jeremy Chadwick To: Mike Carlson Message-ID: <20101108190640.GA15661@icarus.home.lan> References: <4CD84258.6090404@llnl.gov> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CD84258.6090404@llnl.gov> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, pjd@freebsd.org Subject: Re: 8.1-RELEASE: ZFS data errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2010 19:06:43 -0000 On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote: > I'm having a problem with stripping 7 18TB RAID6 (hardware SAN) > volumes together. > > Here is a quick rundown of the hardware: > * HP DL180 G6 w/12GB ram > * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter) > * Winchester Hardware SAN, > > da2 at isp0 bus 0 scbus2 target 0 lun 0 > da2: Fixed Direct Access SCSI-5 device > da2: 800.000MB/s transfers > da2: Command Queueing enabled > da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C) > > > As soon as I create the volume and write data to it, it is reported > as being corrupted: > > write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8 > write# zpool scrub filevol001dd if=/dev/random > of=/filevol001/random.dat.1 bs=1m count=1000 > write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec) > write# cd /filevol001/ > write# ls > random.dat.1 > write# md5 * > MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3 > write# cp random.dat.1 random.dat.2 > cp: random.dat.1: Input/output error > write# zpool status > pool: filevol001 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > filevol001 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > errors: No known data errors > write# zpool scrub filevol001 > write# zpool status > pool: filevol001 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed after 0h0m with 2437 errors on Mon Nov 8 > 10:14:20 2010 > config: > > NAME STATE READ WRITE CKSUM > filevol001 ONLINE 0 0 2.38K > da2 ONLINE 0 0 1.24K 12K repaired > da3 ONLINE 0 0 1.12K > da4 ONLINE 0 0 1.13K > da5 ONLINE 0 0 1.27K > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > errors: 2437 data errors, use '-v' for a list > > However, if I create a 'raidz' volume, no errors occur: > > write# zpool destroy filevol001 > write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8 > write# zpool status > pool: filevol001 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > filevol001 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > errors: No known data errors > write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec) > write# zpool scrub filevol001 > > dmesg output: > write# zpool status > pool: filevol001 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 09:54:51 2010 > config: > > NAME STATE READ WRITE CKSUM > filevol001 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > errors: No known data errors > write# ls > random.dat.1 > write# cp random.dat.1 random.dat.2 > write# cp random.dat.1 random.dat.3 > write# cp random.dat.1 random.dat.4 > write# cp random.dat.1 random.dat.5 > write# cp random.dat.1 random.dat.6 > write# cp random.dat.1 random.dat.7 > write# md5 * > MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2 > MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2 > MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2 > MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2 > MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2 > MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2 > MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2 > > What is also odd, is if I create 7 separate ZFS volumes, they do not > report any data corruption: > > write# zpool destroy filevol001 > write# zpool create test01 da2 > write# zpool create test02 da3 > write# zpool create test03 da4 > write# zpool create test04 da5 > write# zpool create test05 da6 > write# zpool create test06 da7 > write# zpool create test07 da8 > write# zpool status > pool: test01 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test01 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > > errors: No known data errors > > pool: test02 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test02 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > > errors: No known data errors > > pool: test03 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test03 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > > errors: No known data errors > > pool: test04 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test04 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > > errors: No known data errors > > pool: test05 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test05 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > > errors: No known data errors > > pool: test06 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test06 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > > errors: No known data errors > > pool: test07 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test07 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > errors: No known data errors > write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec) > write# cd /tmp/ > write# md5 /tmp/random.dat.1 > MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp > random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1 > /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07 > write# md5 /test*/* > MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 > write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03 > ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ; > zpool scrub test07 > write# zpool status > pool: test01 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:27:49 2010 > config: > > NAME STATE READ WRITE CKSUM > test01 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > > errors: No known data errors > > pool: test02 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:27:52 2010 > config: > > NAME STATE READ WRITE CKSUM > test02 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > > errors: No known data errors > > pool: test03 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:27:54 2010 > config: > > NAME STATE READ WRITE CKSUM > test03 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > > errors: No known data errors > > pool: test04 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:27:57 2010 > config: > > NAME STATE READ WRITE CKSUM > test04 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > > errors: No known data errors > > pool: test05 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:28:00 2010 > config: > > NAME STATE READ WRITE CKSUM > test05 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > > errors: No known data errors > > pool: test06 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:28:02 2010 > config: > > NAME STATE READ WRITE CKSUM > test06 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > > errors: No known data errors > > pool: test07 > state: ONLINE > scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 > 10:28:05 2010 > config: > > NAME STATE READ WRITE CKSUM > test07 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > errors: No known data errors > > Based on these results, I've drawn the following conclusion: > * ZFS single pool per device = OKAY > * ZFS raidz of all devices = OKAY > * ZFS stripe of all devices = NOT OKAY > > The results are immediate, and I know ZFS will self-heal, so is that > what it is doing behind my back and just not reporting it? Is this a > ZFS bug with striping vs. raidz? Can you reproduce this problem using RELENG_8? Please try one of the below snapshots. ftp://ftp4.freebsd.org/pub/FreeBSD/snapshots/201011/ -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |