From owner-freebsd-fs@FreeBSD.ORG Wed Jan 2 12:32:07 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4971D16A418 for ; Wed, 2 Jan 2008 12:32:07 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id 0624B13C45A for ; Wed, 2 Jan 2008 12:32:06 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.tenantsolutions.com [209.163.168.124] (may be forged)) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id m02CW1sJ074337; Wed, 2 Jan 2008 06:32:02 -0600 (CST) (envelope-from anderson@freebsd.org) Message-ID: <477B8440.1020501@freebsd.org> Date: Wed, 02 Jan 2008 06:32:00 -0600 From: Eric Anderson User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: ticso@cicely.de References: <477B16BB.8070104@freebsd.org> <20080102070146.GH49874@cicely12.cicely.de> In-Reply-To: <20080102070146.GH49874@cicely12.cicely.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS i/o errors - which disk is the problem? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2008 12:32:07 -0000 Bernd Walter wrote: > On Tue, Jan 01, 2008 at 10:44:43PM -0600, Eric Anderson wrote: >> I created a zpool with two new identical (500GB) SATA disks. I rsync'ed >> a bunch of data over to the new ZFS file systems, and started seeing i/o >> errors. >> >> Here's how I created the file systems: >> >> zpool create tank mirror ad6 ad8 >> zfs create tank/media >> zfs create tank/documents >> zfs set sharenfs=on tank/media >> zfs set sharenfs=on tank/documents >> zfs set atime=off tank >> zfs set mountpoint=/media tank/media >> zfs set mountpoint=/documents tank/documents >> >> >> Here's what zpool status says: >> >> # zpool status >> pool: tank >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub completed with 731 errors on Tue Jan 1 15:17:08 2008 >> config: >> >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 1.47K >> mirror ONLINE 0 0 1.47K >> ad6 ONLINE 0 0 5.12K >> ad8 ONLINE 0 0 4.66K >> >> How can I tell which drive gave the problems, or where the problem came >> from? I see several errors in /var/log/messages, like: >> >> ZFS: zpool I/O failure, zpool=tank error=86 > > zpool status -v should tell you more details. > But it is not required, since the message below is enough. Yes, I did that, but of course >700 files were listed, but that's about the only difference in output, so I omitted it here. >> and many many of these: >> >> ZFS: checksum mismatch, zpool=tank path=/dev/ad6 offset=31970426880 >> size=131072 >> >> for both the ad6 and ad8 devices. > > So you have crc errors on both drives. > >> I'm happy to swap the drive out, but I don't know which is the problem. >> I was also wondering if it was a saturated I/O issue on the system >> (it's a fairly slow and poky old box). > > The errors mean that silently data written to disk were not the same > when they were read back. > I doubt that this are the drives, but if they are identic it is possible > of course, since firmware bugs are not impossible. > More likely you have a problematic ata controller or maybe defective > ram. I can believe a problematic SATA controller (it's an add-on PCI board), but does anyone know of a way to ask ZFS which devices in a pool it thinks has issues? Eric