Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Feb 2012 13:27:24 +1000
From:      Stephen McKay <mckay@freebsd.org>
To:        freebsd-fs@freebsd.org
Cc:        Stephen McKay <mckay@freebsd.org>
Subject:   Re: Constant minor ZFS corruption, probably solved 
Message-ID:  <201202200327.q1K3ROrt009042@dungeon.home>
In-Reply-To: <201107052241.p65MfqVA002215@dungeon.home> from Stephen McKay at "Wed, 06 Jul 2011 08:41:52 %2B1000"
References:  <201103081425.p28EPQtM002115@dungeon.home> <201107052241.p65MfqVA002215@dungeon.home>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, 6th July 2011, Stephen McKay wrote:

>Perhaps you remember me struggling with a small but continuous amount
>of corruption on ZFS volumes with a new server we had built at work.

>... I've now done enough tests so that I'm 90%
>certain what the problem is: Seagate's caching firmware.

>... I'm certain that disabling write caching
>has given us a stable machine.  And I'm 90% certain that it's because
>of bugs in Seagate's cache firmware.  I hope someone else can replicate
>this and settle the issue.

I'm following up on an old post of mine to confirm that my write cache
disabling workaround is well and truly successful.

Eight months later we've seen no further corruption when using Seagate
ST2000DL003 disks.  The machine (now running 9.0-RELEASE) sees constant
moderate to low activity as a file server (about 6TB in use).

I did receive a message from one other person suffering from the same
problem.  It was solved by disabling write caching, so that's two
data points.  And two data points is a trend, right? :-)

His system was running 8.2-stable on an AMD Phenom CPU in a MSI 870-G45
motherboard (AMD SB710 southbridge) so there's very little overlap
with our system: just zfs and Seagate green disks.  His disks were
ST1500DL003 (1.5TB) with firmware CC32 so that more or less means the
common points are simply zfs and Seagate CC32 firmware.  You already
know which one I think is to blame.

But then again no avalanche of complaints has been seen either, so
it's still somewhat mysterious.  Is there some other problem that is
just being masked by disabling the cache?  Unless there's a sudden
surge in reports, we'll never know for certain.

So, if you've seen this problem and cured it by disabling the write
cache, I'd like to know about it.

How's your data?  Run a scrub lately?  Perhaps now is a good time. ;-)

Cheers,

Stephen.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201202200327.q1K3ROrt009042>