Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Jun 2012 10:40:06 +0300
From:      Daniel Kalchev <daniel@digsys.bg>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS Checksum errors
Message-ID:  <4FE42156.3090006@digsys.bg>
In-Reply-To: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net>
References:  <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net>

next in thread | previous in thread | raw e-mail | index | archive | help


On 22.06.12 00:48, rondzierwa@comcast.net wrote:
> its these kind of antics that make me resistant to the thought of allowing ZFS to manage the raid. it seems to be having problems just managing a big file system. I don't want it to correct anything, or restore anything, just let me delete the files that hurt, fix up the free space list so it doesn't point outside the bounds of the disk, and get on with life.

Been there, done that... I too believed the hype that "hardware RAID" is 
the better, more "reliable" solution.

But, when a third 3ware RAID array failed on me (I have lots) and had to 
spend few days and nights to reassemble the pieces, I finally decided to 
migrate everything to ZFS. In some cases I still do use the expensive 
3ware RAID controllers as SLOW multi-port SATA controllers... I have 
discovered that my disks aren't actually that slow by just using normal 
HBA instead of "hardware RAID" controllers.

I also had few cases, where a disk was considered just perfect, by both 
the 3ware controller (via any kind of test or verification) and 
S.M.A.R.T. but produced checksum errors when used with ZFS. I keep one 
or two such lying around to show to unbelievers.

ZFS is way, way, WAY, more reliable for your data than any RAID 
controller could ever be. The reason is that ZFS checksums each and 
every block (metadata and data) in memory, before sending it down the 
pipe to disks and verifies those checksums when data comes back into 
memory. This is not done by any other system. If your memory is 
reliable, then you can trust ZFS if it tells you there are checksum 
errors: these happened somewhere between memory and disks, most probably 
corrupted RAID controller or on-disk caches, or some flaky bus. If your 
memory is unreliable, bad luck -- no file system can help.

With ZFS over RAID you can only know there are problems "somewhere". 
With ZFS directly managing your disks, you know exactly which disk or 
the bus to it is failing and ZFS will automatically correct things. If 
you have enough redundancy, no data will be damaged.

ZFS doesn't really have FAT table or such and free space is managed 
differently. But you are correct -- there should be tools to fix this 
kind of corruption. There is instrumentation for this, via zdb, but not 
enough good documentation and no one-click tool. In any case, you should 
be more concerned how you got to that corruption.

ZFS does not have problem managing big file system. In fact, if anything 
can manage BIG file system that is ZFS.
Your 12TB is in fact, an moderately small filesystem for ZFS -- it's 
used by way larger installations.

Just let ZFS manage the disks directly.

Daniel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FE42156.3090006>