Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Aug 2007 13:29:46 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Kenneth Vestergaard Schmidt <kvs@pil.dk>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: 'checksum mismatch' all over the place
Message-ID:  <20070820112946.GC16977@garage.freebsd.pl>
In-Reply-To: <m1wsvtkviw.fsf@binarysolutions.dk>
References:  <m1wsvtkviw.fsf@binarysolutions.dk>

next in thread | previous in thread | raw e-mail | index | archive | help

--0/kgSOzhNoDC5T3a
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Aug 18, 2007 at 12:05:27PM +0200, Kenneth Vestergaard Schmidt wrote:
> Hello.
>=20
> We've just put a 12x750 GB raidz2 pool into use, but we're seeing
> constant 'checksum mismatch' errors. The drives are brand new.
>=20
> 'zpool status' currently lists the following:
>=20
>         NAME        STATE     READ WRITE CKSUM
>         pil         ONLINE       0     0 189.9
>           raidz2    ONLINE       0     0 189.9
>             da0     ONLINE       0     0 2.99K
>             da1     ONLINE       0     0   606
>             da2     ONLINE       0     0    75
>             da3     ONLINE       0     0 1.94K
>             da4     ONLINE       0     0   786
>             da5     ONLINE       0     0    88
>             da6     ONLINE       0     0    79
>             da7     ONLINE       0     0    99
>             da8     ONLINE       0     0   533
>             da9     ONLINE       0     0 1.38K
>             da10    ONLINE       0     0    15
>             da11    ONLINE       0     0   628
>=20
> da0-da11 are really logical drives on an EonStor SCSI drive-cage. The
> physical disks are SATA, but since our EonStor can't run in JBOD-mode,
> I've had to create a logical drive per physical drive, and map each onto
> a separate SCSI LUN.
>=20
> The drive-cage was previously used to expose a RAID-5 array, composed of
> the 12 disks. This worked just fine, connecting to the same machine and
> controller (i386 IBM xSeries X335, mpt(4) controller).

How do you know it was fine? Did you have something that did
checksumming? You could try geli with integrity verification feature
turned on, fill the disks with some random data and then read it back,
if your controller corrupts the data, geli should tell you this.

> The EonStor can report SMART-statistics on each SATA-drive, and
> everything looks peachy there.
>=20
> What puzzles me is, that the drives don't seem to be failing - they just
> develop checksum errors. If they had hard failures, ZFS should mark them
> broken. It's also spread across all disks, and I have a hard time
> believing we just got 12 bad drives, which don't register as bad to the
> EonStor.
>=20
> Has anybody seen something like this? Any pointers on how to debug it?

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--0/kgSOzhNoDC5T3a
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGyXsqForvXbEpPzQRAgwtAKCBgnNaoZEJW0lsSRa6vY8OWCcrMwCfaPKu
Yfmpk+99VQGLud1cQ4sFA50=
=BayO
-----END PGP SIGNATURE-----

--0/kgSOzhNoDC5T3a--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070820112946.GC16977>