From owner-freebsd-fs@FreeBSD.ORG Wed Mar 9 12:57:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E151B106566B; Wed, 9 Mar 2011 12:57:10 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by mx1.freebsd.org (Postfix) with ESMTP id 460C28FC0C; Wed, 9 Mar 2011 12:57:10 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAHYDd0120Fhq/2dsb2JhbACmbHTDGIVlBJAc Received: from unknown (HELO dungeon.home) ([118.208.88.106]) by ipmail06.adl2.internode.on.net with ESMTP; 09 Mar 2011 23:11:57 +1030 Received: from dungeon.home (localhost [127.0.0.1]) by dungeon.home (8.14.3/8.14.3) with ESMTP id p29CfUM1003302; Wed, 9 Mar 2011 22:41:30 +1000 (EST) (envelope-from mckay) Message-Id: <201103091241.p29CfUM1003302@dungeon.home> From: Stephen McKay To: Chris Forgeron , Mark Felder References: <201103081425.p28EPQtM002115@dungeon.home> In-Reply-To: from Chris Forgeron at "Tue, 08 Mar 2011 18:40:00 -0400" Date: Wed, 09 Mar 2011 22:41:30 +1000 Sender: smckay@internode.on.net Cc: freebsd-fs@freebsd.org, Stephen McKay Subject: Re: Constant minor ZFS corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2011 12:57:11 -0000 On Tuesday, 8th March 2011, Chris Forgeron wrote: >Have you make sure it's not always the same drives with the checksum >errors? It make take a few days to know for sure.. Of the 12 disks, only 1 has been error-free. I've been doing this for about 10 days now and there is no pattern that I can see in the errors. >It shouldn't matter which NFS client you use, if you're seeing ZFS >checksum errors in zpool status, that won't be from whatever program >is writing it. I'm mounting another server using NFS to get my 1TB of test data, so the problem box is the NFS client, not the server. Sorry if there was any confusion. I had a theory that there was a race in the NFS client or perhaps the code that steal memory from ZFS's ARC when NFS needs it. However, that seems less likely now as I have done the same 1TB test copy again today but this time using ssh as the transport. I saw the same ZFS checksum errors as before. :-( >..oh, and don't forget about fun with Expanders. I assume you're using one? No. This board has 14 ports made up of 6 native and 8 from the LSI2008 chip on the PIKE card. Each is cabled directly to a drive. >I've got 2 LSI2008 based controllers in my 9-Current machine without >any fuss. That's running a 24 disk Mirror right now. That's encouraging news. Maybe I can win eventually. On Tuesday, 8th March 2011, Mark Felder wrote: >Highly interested in what FreeBSD version and what ZFS version and zpool >version you're running. I was using 8.2-release plus the mps driver from 8.2-stable. Hence the filesystem version is 4 and pool version is 15. But I installed -current a few days ago while keeping the same pool and found that the errors still occurred. The v28 code has extra locking in interesting places but it made no difference to the checksum errors. As of today, I've destroyed the pool and built a version 28 pool (fs version 5) on a subset of disks (those attached to the onboard controller). I'll know by tomorrow how that went. BTW, with my code in place to decipher "type 19" entries and a kernel printf that bypasses the need to get devd.conf right, I see something like this for each checksum error: log_sysevent: ESC_dev_dle: class=ereport.fs.zfs.checksum ena=4822220020083854337 detector={ version=0 scheme=zfs pool=44947180927799912 vdev=6194846651369573567} pool=dread pool_guid=44947180927799912 pool_context=0 pool_failmode=wait vdev_guid=6194846651369573567 vdev_type=disk vdev_path=/dev/gpt/bay7 parent_guid=18008078209829074821 parent_type=raidz zio_err=0 zio_offset=194516353024 zio_size=4096 zio_objset=276 zio_object=0 zio_level=0 zio_blkid=132419 bad_ranges=0000000000001000 bad_ranges_min_gap=8 bad_range_sets=00000445 bad_range_clears=00002924 bad_set_histogram=001b001a001e002b0021001600120018001a001600210018001500150016001c001c0019001200190022001b0019001b0017000f0014000e0013001a001c001f000c000c000c0007000b000d0010001f00060009000800080007000c0010000f00070007000500070008000600080008000a0002000100060004000300070004 bad_cleared_histogram=00820089009700ac00a700b900b2009000730084009500af00a300ad00a900ac0082009300ad00c200ac00d200a8008f0078008b008e00b700bf00b9009f00a60083! 009500a400c100c200b700cd009900780090009b00be00af00c100a700980083008a00a200c900bc00d400b200a3007e0089009400c400c700d400b8009b That's a hideous blob of awful, and I don't really know what to do with it. Cheers, Stephen.