From owner-freebsd-fs@FreeBSD.ORG Tue Nov 9 01:05:15 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E1EA1065674; Tue, 9 Nov 2010 01:05:15 +0000 (UTC) (envelope-from carlson39@llnl.gov) Received: from smtp.llnl.gov (nspiron-3.llnl.gov [128.115.41.83]) by mx1.freebsd.org (Postfix) with ESMTP id 3E8D68FC37; Tue, 9 Nov 2010 01:05:15 +0000 (UTC) X-Attachments: None Received: from bagua.llnl.gov (HELO [134.9.197.135]) ([134.9.197.135]) by smtp.llnl.gov with ESMTP; 08 Nov 2010 17:05:14 -0800 Message-ID: <4CD89E4A.6000902@llnl.gov> Date: Mon, 08 Nov 2010 17:05:14 -0800 From: Mike Carlson User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CD84258.6090404@llnl.gov> <20101108190640.GA15661@icarus.home.lan> <4CD84B63.4030800@llnl.gov> <20101108192950.GA15902@icarus.home.lan> In-Reply-To: <20101108192950.GA15902@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" , "pjd@freebsd.org" Subject: Re: 8.1-RELEASE: ZFS data errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Nov 2010 01:05:15 -0000 On 11/08/2010 11:29 AM, Jeremy Chadwick wrote: > On Mon, Nov 08, 2010 at 11:11:31AM -0800, Mike Carlson wrote: >> On 11/08/2010 11:06 AM, Jeremy Chadwick wrote: >>> On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote: >>>> I'm having a problem with stripping 7 18TB RAID6 (hardware SAN) >>>> volumes together. >>>> >>>> Here is a quick rundown of the hardware: >>>> * HP DL180 G6 w/12GB ram >>>> * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter) >>>> * Winchester Hardware SAN, >>>> >>>> da2 at isp0 bus 0 scbus2 target 0 lun 0 >>>> da2: Fixed Direct Access SCSI-5 device >>>> da2: 800.000MB/s transfers >>>> da2: Command Queueing enabled >>>> da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C) >>>> >>> >> The server is in a data center with limited access control, do I >> have to option of using a particular CVS tag (checking out via csup) >> and then perform a make world/kernel? > Doing this is more painful than, say, downloading a livefs image and > seeing if you can reproduce the problem (e.g. you won't be modifying > your existing OS installation), especially since I can't guarantee that > the problem you're seeing is fixed in RELENG_8 (hence my request to > begin with). But if you can't boot livefs, then here you go: > > You'll need some form of console access (either serial or VGA) to do the > upgrade reliably. "Rolling back" may also not be an option since > RELENG_8 is newer than RELENG_8_1 and may have introduced some new > binaries or executables into the fray. If you don't have console access > to this machine, if things go awry you may be SOL. The vagueness of my > statement is intentional; I can't cover every situation that might come > to light. > > Please be sure to back up your kernel configuration file before doing > the following, and make sure that the supfile shown below has > tag=RELENG_8 in it (it should). And yes, the rm commands below are > recommended; failure to use them could result in some oddities given > that your /usr/src tree refers to RELENG_8_1 version numbers which > differ from RELENG_8. You *do not* have to do this for ports (since for > ports, tag=. is used by default). > > rm -fr /var/db/sup/src-all > rm -fr /usr/src/* > rm -fr /usr/obj/* > csup -h cvsupserver -L 2 /usr/share/examples/cvsup/stable-supfile > > At this point you can restore your kernel configuration file to the > appropriate place (/sys/i386/conf, /sys/amd64/conf, etc.) and build > world/kernel as per the instructions in /usr/src/Makefile (see lines > ~51-62). ***Please do not skip any of the steps***. Good luck. > > -- > | Jeremy Chadwick jdc@parodius.com | > | Parodius Networking http://BLOCKEDwww.BLOCKEDparodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > I wasn't able to make it to the Data Center to boot off of a USB/CD, but I did follow your steps to upgrade to RELENG_8. So far, things are stable: write# uname -a FreeBSD write.llnl.gov 8.1-STABLE FreeBSD 8.1-STABLE #0: Mon Nov 8 16:38:06 PST 2010 root@write.llnl.gov:/usr/obj/usr/src/sys/GENERIC amd64 write# kldstat Id Refs Address Size Name 1 15 0xffffffff80100000 d86d18 kernel 2 1 0xffffffff80e87000 f058 aio.ko 3 1 0xffffffff80e97000 16ea40 ispfw.ko 4 1 0xffffffff81006000 5568 geom_multipath.ko 5 1 0xffffffff81222000 104ac5 zfs.ko 6 1 0xffffffff81327000 1a15 opensolaris.ko write# zpool create test01 da2 da3 da4 da5 da6 da7 da8 write# zpool status write# cd /tmp write# clear write# cp random.dat.1 /test01/ write# cp random.dat.1 /test01/random.dat.2 write# cp random.dat.1 /test01/random.dat.3 write# cp random.dat.1 /test01/random.dat.4 write# cp random.dat.1 /test01/random.dat.5 write# cp random.dat.1 /test01/random.dat.6 write# md5 random.dat.1 MD5 (random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 write# md5 /test01/random.dat.* MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36 MD5 (/test01/random.dat.2) = f795fa09e1b0975c0da0ec6e49544a36 MD5 (/test01/random.dat.3) = f795fa09e1b0975c0da0ec6e49544a36 MD5 (/test01/random.dat.4) = f795fa09e1b0975c0da0ec6e49544a36 MD5 (/test01/random.dat.5) = f795fa09e1b0975c0da0ec6e49544a36 MD5 (/test01/random.dat.6) = f795fa09e1b0975c0da0ec6e49544a36 write# zpool status pool: test01 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test01 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 errors: No known data errors write# zpool scrub test01 write# zpool status pool: test01 state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Mon Nov 8 17:00:01 2010 config: NAME STATE READ WRITE CKSUM test01 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 errors: No known data errors Any ideas for further testing to narrow down the culprit? Oh, one other thing that I modified was /boot/loader.conf. I had previously limited the vfs.zfs.arc_max to 1024M, so I had also commented that out. Thanks again, I'm going to continue writing files and scrubbing the array until I have a level of confidence with the file system. Mike C