From owner-freebsd-current@FreeBSD.ORG Tue Oct 2 23:12:37 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A44F16A41B for ; Tue, 2 Oct 2007 23:12:37 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) Received: from ipmail03.adl2.internode.on.net (ipmail03.adl2.internode.on.net [203.16.214.135]) by mx1.freebsd.org (Postfix) with ESMTP id 57F9213C455 for ; Tue, 2 Oct 2007 23:12:36 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah4FAI5qAkd5LQ47/2dsb2JhbACBWQ X-IronPort-AV: E=Sophos;i="4.21,221,1188743400"; d="scan'208";a="159577882" Received: from ppp121-45-14-59.lns10.adl2.internode.on.net (HELO mail.clearchain.com) ([121.45.14.59]) by ipmail03.adl2.internode.on.net with ESMTP; 03 Oct 2007 08:42:32 +0930 Received: from wolf.clearchain.com (wcl.ml.unisa.edu.au [130.220.166.5]) (authenticated bits=0) by mail.clearchain.com (8.13.8/8.13.8) with ESMTP id l92NCOch056214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 3 Oct 2007 08:42:30 +0930 (CST) (envelope-from Benjamin.Close@clearchain.com) Message-ID: <4702D057.3090106@clearchain.com> Date: Wed, 03 Oct 2007 08:42:23 +0930 From: Benjamin Close User-Agent: Thunderbird 2.0.0.0 (X11/20070615) MIME-Version: 1.0 To: Brooks Talley References: <7344605.82541191344652015.JavaMail.root@zmail.illuminati.org> In-Reply-To: <7344605.82541191344652015.JavaMail.root@zmail.illuminati.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on pegasus.clearchain.com X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (mail.clearchain.com [192.168.154.1]); Wed, 03 Oct 2007 08:42:31 +0930 (CST) X-Mailman-Approved-At: Tue, 02 Oct 2007 23:25:22 +0000 Cc: freebsd-current Subject: Re: ZFS corrupting data, even just sitting idle X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2007 23:12:37 -0000 Brooks Talley wrote: > Hi, everyone. I'm running 7.0-current amd64, built from CVS on September 12. I've got a 4.5TB ZFS array across 8 750GB drives in a RAIDZ1 + hotspare configuration. > > It's corrupting data even just sitting at idle with no access at all. I had loaded it up with about 4TB of data several weeks ago, then noticed that a zpool status showed checksum errors about a week ago. I ran a scrub and it turned 122 errors affecting about 20 files. The errors were spread across the physical disks pretty evenly, so it didn't seem like one bad drive. > > I left for vacation and unplugged the network from the machine to ensure that there would be no access to the disk. There are no cron jobs or anything else running locally that so much as touch the zpool. > > Upon returning, I ran a zpool scrub and it found an additional 116 checksum errors in another 17 files, also evenly spread across the physical drives. > > The system is running a Supermicro motherboard, Supermicro AOC-SAT-MV8 SATA card, and WD 750GB drives. 2GB memory, no real apps running, just storage. > > Anyone seen anything like this? It's a bit of a concern. > Just adding a 'me too' to the topic. But in my case I have confirmed it's a Sun 3511 raid array corrupting the data. With the cache set to write through, everything is perfect. With it set to write back, random checksum errors start appearing. The raid is configured as a 4.8TB raid 5... occasionally 'refreshing' the parity on the array fixes the checksum errors. Before ZFS, there was no way of knowing where the corruption occurred. Now we can find out. Thanks pjd & others. Cheers, Benjamin