From owner-freebsd-fs@FreeBSD.ORG Mon Aug 20 12:20:36 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 410A616A417 for ; Mon, 20 Aug 2007 12:20:36 +0000 (UTC) (envelope-from kvs@binarysolutions.dk) Received: from solow.pil.dk (relay.pil.dk [195.41.47.164]) by mx1.freebsd.org (Postfix) with ESMTP id 069A913C45E for ; Mon, 20 Aug 2007 12:20:35 +0000 (UTC) (envelope-from kvs@binarysolutions.dk) Received: from coruscant.local (naboo.binarysolutions.dk [80.196.17.173]) by solow.pil.dk (Postfix) with ESMTP id 249E31CC0BE; Mon, 20 Aug 2007 14:20:35 +0200 (CEST) Received: by coruscant.local (Postfix, from userid 502) id 728DE5A0B4D; Mon, 20 Aug 2007 14:20:33 +0200 (CEST) To: Pawel Jakub Dawidek References: <20070820112946.GC16977@garage.freebsd.pl> From: Kenneth Vestergaard Schmidt Date: Mon, 20 Aug 2007 14:20:33 +0200 In-Reply-To: <20070820112946.GC16977@garage.freebsd.pl> (Pawel Jakub Dawidek's message of "Mon\, 20 Aug 2007 13\:29\:46 +0200") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (darwin) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: 'checksum mismatch' all over the place X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 12:20:36 -0000 Pawel Jakub Dawidek writes: >> The drive-cage was previously used to expose a RAID-5 array, composed of >> the 12 disks. This worked just fine, connecting to the same machine and >> controller (i386 IBM xSeries X335, mpt(4) controller). > > How do you know it was fine? Did you have something that did > checksumming? You could try geli with integrity verification feature > turned on, fill the disks with some random data and then read it back, > if your controller corrupts the data, geli should tell you this. I may have to do this. The previous drive was almost filled to the brim with data, which rsync looked at each day, and we didn't have a lot of re-transfer, but that doesn't necessarily mean anything. The same controller is used in 50+ other machines, but only connected to two internal drives. There are no problems in those machines. Still, the really weird thing is that we're seeing checksum-errors in the same block across many drives. This does smell like either an issue with the driver, the controller, or the drivecage, and not ZFS or GEOM. The machine should have been in production, but the array just failed, and if I can't salvage it, I'll have to start over. I might just as well try geli with integrity verification before recreating the ZFS array, then. -- Kenneth Schmidt pil.dk