Date: Mon, 20 Aug 2007 13:29:46 +0200 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Kenneth Vestergaard Schmidt <kvs@pil.dk> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: 'checksum mismatch' all over the place Message-ID: <20070820112946.GC16977@garage.freebsd.pl> In-Reply-To: <m1wsvtkviw.fsf@binarysolutions.dk> References: <m1wsvtkviw.fsf@binarysolutions.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
--0/kgSOzhNoDC5T3a Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Aug 18, 2007 at 12:05:27PM +0200, Kenneth Vestergaard Schmidt wrote: > Hello. >=20 > We've just put a 12x750 GB raidz2 pool into use, but we're seeing > constant 'checksum mismatch' errors. The drives are brand new. >=20 > 'zpool status' currently lists the following: >=20 > NAME STATE READ WRITE CKSUM > pil ONLINE 0 0 189.9 > raidz2 ONLINE 0 0 189.9 > da0 ONLINE 0 0 2.99K > da1 ONLINE 0 0 606 > da2 ONLINE 0 0 75 > da3 ONLINE 0 0 1.94K > da4 ONLINE 0 0 786 > da5 ONLINE 0 0 88 > da6 ONLINE 0 0 79 > da7 ONLINE 0 0 99 > da8 ONLINE 0 0 533 > da9 ONLINE 0 0 1.38K > da10 ONLINE 0 0 15 > da11 ONLINE 0 0 628 >=20 > da0-da11 are really logical drives on an EonStor SCSI drive-cage. The > physical disks are SATA, but since our EonStor can't run in JBOD-mode, > I've had to create a logical drive per physical drive, and map each onto > a separate SCSI LUN. >=20 > The drive-cage was previously used to expose a RAID-5 array, composed of > the 12 disks. This worked just fine, connecting to the same machine and > controller (i386 IBM xSeries X335, mpt(4) controller). How do you know it was fine? Did you have something that did checksumming? You could try geli with integrity verification feature turned on, fill the disks with some random data and then read it back, if your controller corrupts the data, geli should tell you this. > The EonStor can report SMART-statistics on each SATA-drive, and > everything looks peachy there. >=20 > What puzzles me is, that the drives don't seem to be failing - they just > develop checksum errors. If they had hard failures, ZFS should mark them > broken. It's also spread across all disks, and I have a hard time > believing we just got 12 bad drives, which don't register as bad to the > EonStor. >=20 > Has anybody seen something like this? Any pointers on how to debug it? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --0/kgSOzhNoDC5T3a Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGyXsqForvXbEpPzQRAgwtAKCBgnNaoZEJW0lsSRa6vY8OWCcrMwCfaPKu Yfmpk+99VQGLud1cQ4sFA50= =BayO -----END PGP SIGNATURE----- --0/kgSOzhNoDC5T3a--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070820112946.GC16977>