From owner-freebsd-fs@FreeBSD.ORG Mon Feb 20 03:43:35 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A682A106564A; Mon, 20 Feb 2012 03:43:35 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by mx1.freebsd.org (Postfix) with ESMTP id 0977E8FC14; Mon, 20 Feb 2012 03:43:34 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAAq9QU920ALe/2dsb2JhbABDsiyBCIF0AQVWIxABCkY5BBq9e4t9AgQQBgsJNQkDAoNiWIMeBKg2 Received: from ppp118-208-2-222.lns20.bne1.internode.on.net (HELO dungeon.home) ([118.208.2.222]) by ipmail04.adl6.internode.on.net with ESMTP; 20 Feb 2012 13:58:17 +1030 Received: from dungeon.home (localhost [127.0.0.1]) by dungeon.home (8.14.4/8.14.3) with ESMTP id q1K3ROrt009042; Mon, 20 Feb 2012 13:27:24 +1000 (EST) (envelope-from mckay) Message-Id: <201202200327.q1K3ROrt009042@dungeon.home> From: Stephen McKay To: freebsd-fs@freebsd.org References: <201103081425.p28EPQtM002115@dungeon.home> <201107052241.p65MfqVA002215@dungeon.home> In-Reply-To: <201107052241.p65MfqVA002215@dungeon.home> from Stephen McKay at "Wed, 06 Jul 2011 08:41:52 +1000" Date: Mon, 20 Feb 2012 13:27:24 +1000 Sender: smckay@internode.on.net Cc: Stephen McKay Subject: Re: Constant minor ZFS corruption, probably solved X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Feb 2012 03:43:35 -0000 On Wednesday, 6th July 2011, Stephen McKay wrote: >Perhaps you remember me struggling with a small but continuous amount >of corruption on ZFS volumes with a new server we had built at work. >... I've now done enough tests so that I'm 90% >certain what the problem is: Seagate's caching firmware. >... I'm certain that disabling write caching >has given us a stable machine. And I'm 90% certain that it's because >of bugs in Seagate's cache firmware. I hope someone else can replicate >this and settle the issue. I'm following up on an old post of mine to confirm that my write cache disabling workaround is well and truly successful. Eight months later we've seen no further corruption when using Seagate ST2000DL003 disks. The machine (now running 9.0-RELEASE) sees constant moderate to low activity as a file server (about 6TB in use). I did receive a message from one other person suffering from the same problem. It was solved by disabling write caching, so that's two data points. And two data points is a trend, right? :-) His system was running 8.2-stable on an AMD Phenom CPU in a MSI 870-G45 motherboard (AMD SB710 southbridge) so there's very little overlap with our system: just zfs and Seagate green disks. His disks were ST1500DL003 (1.5TB) with firmware CC32 so that more or less means the common points are simply zfs and Seagate CC32 firmware. You already know which one I think is to blame. But then again no avalanche of complaints has been seen either, so it's still somewhat mysterious. Is there some other problem that is just being masked by disabling the cache? Unless there's a sudden surge in reports, we'll never know for certain. So, if you've seen this problem and cured it by disabling the write cache, I'd like to know about it. How's your data? Run a scrub lately? Perhaps now is a good time. ;-) Cheers, Stephen.