From owner-freebsd-fs@FreeBSD.ORG Thu Oct 18 05:20:21 2012 Return-Path: Delivered-To: FS@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 62C078B8 for ; Thu, 18 Oct 2012 05:20:21 +0000 (UTC) (envelope-from james@jrv.org) Received: from mail.jrv.org (adsl-70-243-84-11.dsl.austtx.swbell.net [70.243.84.11]) by mx1.freebsd.org (Postfix) with ESMTP id EAFED8FC08 for ; Thu, 18 Oct 2012 05:20:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id D41BF6D65AF; Thu, 18 Oct 2012 00:10:18 -0500 (CDT) X-Virus-Scanned: amavisd-new at zimbra.housenet.jrv Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra.housenet.jrv [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XQcK+khE5SDO; Thu, 18 Oct 2012 00:09:21 -0500 (CDT) Received: from [10.0.2.15] (adsl-70-243-84-14.dsl.austtx.swbell.net [70.243.84.14]) by mail.jrv.org (Postfix) with ESMTPSA id 2B4436D603F; Thu, 18 Oct 2012 00:09:21 -0500 (CDT) Message-ID: <507F8EFF.4020609@jrv.org> Date: Thu, 18 Oct 2012 00:09:19 -0500 From: "James R. Van Artsdalen" User-Agent: Mozilla/5.0 (Windows NT 5.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Heikki Suonsivu Subject: Re: ZFS raidz2, errors in file? References: <507EED58.80409@suonsivu.net> In-Reply-To: <507EED58.80409@suonsivu.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FS@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 05:20:21 -0000 On 10/17/2012 12:39 PM, Heikki Suonsivu wrote: > SMART data indicates problems on two other disks, but no indication of > those are seen in logs (the disks work, but SMART information > indicates problems). The problems may be in areas ZFS has not tried to read. > One disk indeed has pending sector, not unusual and should be survivable: > > ------------------------------------------------------------------------ > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age > Always - 1 > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age > Offline - 1 That error means one sector is unreadable and a replacement is pending; replacement will happen when next as the sector is overwritten. The contents of that sector are lost (unless some future read succeeds). > In addition, there seems to be ICRC DMA errors on da0. Looks nasty, > but only show up in SMART log, not in /var/log/messages. > > ------------------------------------------------------------------------ > 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age > Always - 112 I believe that both of these messages refer to errors in transfers between the disk and host, not to errors within the disk. Test your cabling and enclosures. > SMART Error Log Version: 1 > ATA Error Count: 112 (device log contains only the most recent five > errors) I don't like these at all. Consider replacing that disk. > If the da0 ICRC errors would have been seen by ZFS, it should have > made a) note of that in some log? b) retried write? c) Something > else? If we assume that the disk firmware is broken and does not > report these to OS, so da0 might be corrupt. But that should still be > ok with raidz2. These errors should trigger retries in layers beneath ZFS > We do have 3 random SCSI timeouts, which were seen by FreeBSD, and > thus should have prompted ZFS do handle the errors, and one read error > on data, which is not reported as read error in any log, other than > disk's SMART info says so. The retries may have happened at layer below ZFS. ZFS does not call the disk driver directly. Other layers play a role in error handing.