From owner-freebsd-stable@FreeBSD.ORG Tue Sep 2 19:07:17 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08CFE10656C4 for ; Tue, 2 Sep 2008 19:07:17 +0000 (UTC) (envelope-from tom.hurst@clara.net) Received: from spork.qfe3.net (spork.qfe3.net [212.13.207.101]) by mx1.freebsd.org (Postfix) with ESMTP id BD61D8FC26 for ; Tue, 2 Sep 2008 19:07:16 +0000 (UTC) (envelope-from tom.hurst@clara.net) Received: from [81.104.123.28] (helo=voi.aagh.net) by spork.qfe3.net with esmtp (Exim 4.66 (FreeBSD)) (envelope-from ) id 1KabDe-0005Wc-SP; Tue, 02 Sep 2008 20:07:14 +0100 Received: from freaky by voi.aagh.net with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1KabDe-0009tN-Ge; Tue, 02 Sep 2008 20:07:14 +0100 Date: Tue, 2 Sep 2008 20:07:14 +0100 From: Thomas Hurst To: Jeremy Chadwick Message-ID: <20080902190714.GA34895@voi.aagh.net> Mail-Followup-To: Jeremy Chadwick , freebsd-stable@FreeBSD.org References: <20080810175934.X2427@borg> <20080811065822.GA81972@voi.aagh.net> <20080811130555.GA25024@eos.sc1.parodius.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080811130555.GA25024@eos.sc1.parodius.com> Organization: Not much. User-Agent: Mutt/1.5.18 (2008-05-17) Sender: Thomas Hurst Cc: freebsd-stable@FreeBSD.org Subject: Re: ICRC's X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2008 19:07:17 -0000 * Jeremy Chadwick (koitsu@FreeBSD.org) wrote: > On Mon, Aug 11, 2008 at 07:58:22AM +0100, Thomas Hurst wrote: > > * Larry Rosenman (ler@lerctr.org) wrote: > > > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=154593293 > > > > > > NAME STATE READ WRITE CKSUM > > > ad8 ONLINE 0 0 17 > > Having just experienced NTFS corruption in Windows thanks to a > > slightly kinked SATA cable (hint: *never* chkdsk/fsck/etc until > > you're sure the cables are fine), I would *love* to know why this > > causes a checksum error at ZFS level rather than a read error that > > any filesystem (or indeed RAID layer) will notice. > > The ad8 errors you're quoting come from the ATA subsystem in FreeBSD. > That is lower-level (e.g. closer to the hardware) than ZFS's checksum > method is. Yes, but ZFS is clearly still seeing corrupt data from its reads because the CKSUM counter's going up, not READ, which would indicate it's reads were actually failing at ATA level. > If Larry was using UFS, he'd also see the above errors from the > kernel. FreeBSD reports the CRC errors reported by the ATA device, > ZFS reports the said data as corrupted during scrubbing or standard > usage (hence the CKSUM field in 'zpool status'), ZFS should only see corruption that's undetected by ATA's CRC's though (or the disk's own error correction); if it's actually causing a CRC error at protocol level, ZFS should not see it, because that IO operation failed. > and ZFS also *repairs* the corrupted data. I can't explain how the > repair works, It repairs by having duplicate copies of data and metadata; in the case of vital metadata it stores "ditto-blocks" so there are always multiple copies of it about, similar to UFS's superblock being spread all over the disk. For most data you generally want some level of ZFS RAID, but I'm pretty sure you can make it store multiple copies on the same disk (zfs set copies=2 on a 1-disk ZFS, for example). In the event of IO errors, I believe some Linux software RAID levels can perform similar recovery; rewriting the erroring blocks from another device to force the disk to rewrite the broken block. > I believe journalling filesystems (e.g. ext3fs and gjournal) have > this ability, while Standard UFS, UFS2, NTFS, FAT, and many others do > not. No, journalling has nothing to do with this kind of self-healing; a journal allows a filesystem to be made consistent when interrupted (i.e. by a crash or power failure) with a very small number of operations because it has a log of what has or was about to happen. Journalling filesystems are just as vulnerable to corruption as non-journalling ones. NTFS is journalling, BTW. > > What's the point in having the connection protected by a CRC if it's > > just going to let bogus data through anyway? > > A CRC (or checksum) acts as a method of differential detection, e.g. > detect corruption between X and Y. CRCs are not the same thing as error > correction or retransmittal; they only result in reporting data > corruption, and cannot repair it. Yes, I know what CRC's are; my point is, a CRC error should mean the corrupt data doesn't make it to the higher layers; ZFS, UFS, gmirror, whatever, should get an IO error if the data can't be retrieved after retries fail, they shouldn't get an apparently successful read with corrupt data in it. Perhaps this is the case, and (S)ATA's CRC's are just so poor a couple of retries is enough to get corrupt data which happens to have a correct CRC. -- Thomas 'Freaky' Hurst http://hur.st/