From owner-freebsd-stable@FreeBSD.ORG Tue Mar 4 13:47:57 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1EFC1065671; Tue, 4 Mar 2008 13:47:56 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id DB3468FC1F; Tue, 4 Mar 2008 13:47:56 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id C424E1CC033; Tue, 4 Mar 2008 05:47:56 -0800 (PST) Date: Tue, 4 Mar 2008 05:47:56 -0800 From: Jeremy Chadwick To: Eric Anderson Message-ID: <20080304134756.GA90698@eos.sc1.parodius.com> References: <47ACD7D4.5050905@skyrush.com> <47ACDE82.1050100@skyrush.com> <20080208173517.rdtobnxqg4g004c4@www.wolves.k12.mo.us> <47ACF0AE.3040802@skyrush.com> <1202747953.27277.7.camel@buffy.york.ac.uk> <47B0A45C.4090909@skyrush.com> <47CD4DCF.5070505@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47CD4DCF.5070505@freebsd.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-fs@freebsd.org, Joe Peterson , freebsd-stable@freebsd.org Subject: Re: Analysis of disk file block with ZFS checksum error X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Mar 2008 13:47:57 -0000 On Tue, Mar 04, 2008 at 07:25:35AM -0600, Eric Anderson wrote: > I'm starting to think there is a timing issue or some such problem with > ZFS, since I can use the same drives in a gmirror with UFS, and never have > any data problems (md5 checksums confirm it over-and-over). I highly doubt > that everyone is seeing similar issues and it just is because ZFS is so > intense. I've had plenty of systems under severe disk load that have never > exhibited corrupt files because of something like this. One thing that hasn't been mentioned (or maybe it has been but I missed it): FreeBSD's ZFS port is version 6, while Solaris is up to version 10. Is it possible that the problem folks are experiencing, including the infamous deadlock or crash on heavy I/O between UFS/UFS2 and ZFS filesystems, could've been fixed between versions 6 and 10? I myself use gstripe(8) and UFS2 (no softupdates) on two identical SATA disks. I do nightly backups so if I lose a disk, I'm OK. My transfer rates are quite good (~143MB/sec read, ~130MB/sec write -- really!) on the stripe, and in the past 2 weeks I have spent a LOT of time copying over 150GB of data back and forth between the stripe and the backup disk without any issues. All disks are on an ICH7 controller. > I wish we could get our hands on this issue.. Seems like some common > threads are ATA/SATA disks. Is your setup running 32bit or 64bit FreeBSD? > (if you already mentioned it, I'm sorry, I missed it) So far the reports have shown that it's not specific to either i386 or amd64, and that it's not specific to any type of hardware (motherboard, controller, etc.). Joe's setup is very different from mine, for example. If the same disks are fine when used with UFS/UFS2, then I'd say it's less of a ATA subsystem bug, and more of an oddity with ZFS on FreeBSD. If it's reproducable, that would be helpful to developers. Regarding ATA/SATA though, there are reports of DMA timeouts and other oddities happening on ATA/SATA disks on FreeBSD. When I was using ZFS not too long ago, I experienced that problem when doing heavy I/O (copying data from a standard UFS2 disk to a ZFS RAIDZ pool). It's been the only time I've seen this problem. http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/040013.html The drive showed no signs of errors (SMART stats look fine, no mechanical noises or other oddities). I've since replaced it out of pure paranoia with a disk identical to the ones on the gstripe(8). Regarding those issues (DMA errors, etc.), Scott Long has offered to help, but needs systems which can reproduce the problem reliably and have remote access (serial highly recommended). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |