From owner-freebsd-stable@FreeBSD.ORG  Tue Sep  2 19:07:17 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08CFE10656C4
	for <freebsd-stable@FreeBSD.org>; Tue,  2 Sep 2008 19:07:17 +0000 (UTC)
	(envelope-from tom.hurst@clara.net)
Received: from spork.qfe3.net (spork.qfe3.net [212.13.207.101])
	by mx1.freebsd.org (Postfix) with ESMTP id BD61D8FC26
	for <freebsd-stable@FreeBSD.org>; Tue,  2 Sep 2008 19:07:16 +0000 (UTC)
	(envelope-from tom.hurst@clara.net)
Received: from [81.104.123.28] (helo=voi.aagh.net)
	by spork.qfe3.net with esmtp (Exim 4.66 (FreeBSD))
	(envelope-from <tom.hurst@clara.net>)
	id 1KabDe-0005Wc-SP; Tue, 02 Sep 2008 20:07:14 +0100
Received: from freaky by voi.aagh.net with local (Exim 4.69 (FreeBSD))
	(envelope-from <tom.hurst@clara.net>)
	id 1KabDe-0009tN-Ge; Tue, 02 Sep 2008 20:07:14 +0100
Date: Tue, 2 Sep 2008 20:07:14 +0100
From: Thomas Hurst <tom.hurst@clara.net>
To: Jeremy Chadwick <koitsu@FreeBSD.org>
Message-ID: <20080902190714.GA34895@voi.aagh.net>
Mail-Followup-To: Jeremy Chadwick <koitsu@FreeBSD.org>,
	freebsd-stable@FreeBSD.org
References: <20080810175934.X2427@borg> <20080811065822.GA81972@voi.aagh.net>
	<20080811130555.GA25024@eos.sc1.parodius.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20080811130555.GA25024@eos.sc1.parodius.com>
Organization: Not much.
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: Thomas Hurst <freaky@voi.aagh.net>
Cc: freebsd-stable@FreeBSD.org
Subject: Re: ICRC's
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Sep 2008 19:07:17 -0000

* Jeremy Chadwick (koitsu@FreeBSD.org) wrote:

> On Mon, Aug 11, 2008 at 07:58:22AM +0100, Thomas Hurst wrote:
> > * Larry Rosenman (ler@lerctr.org) wrote:
> > > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=154593293
> > > 
> > >  	NAME        STATE     READ WRITE CKSUM
> > >  	    ad8     ONLINE       0     0    17

> > Having just experienced NTFS corruption in Windows thanks to a
> > slightly kinked SATA cable (hint: *never* chkdsk/fsck/etc until
> > you're sure the cables are fine), I would *love* to know why this
> > causes a checksum error at ZFS level rather than a read error that
> > any filesystem (or indeed RAID layer) will notice.
> 
> The ad8 errors you're quoting come from the ATA subsystem in FreeBSD.
> That is lower-level (e.g. closer to the hardware) than ZFS's checksum
> method is.

Yes, but ZFS is clearly still seeing corrupt data from its reads because
the CKSUM counter's going up, not READ, which would indicate it's reads
were actually failing at ATA level.

> If Larry was using UFS, he'd also see the above errors from the
> kernel.  FreeBSD reports the CRC errors reported by the ATA device,
> ZFS reports the said data as corrupted during scrubbing or standard
> usage (hence the CKSUM field in 'zpool status'),

ZFS should only see corruption that's undetected by ATA's CRC's though
(or the disk's own error correction); if it's actually causing a CRC
error at protocol level, ZFS should not see it, because that IO
operation failed.

> and ZFS also *repairs* the corrupted data.  I can't explain how the
> repair works,

It repairs by having duplicate copies of data and metadata; in the case
of vital metadata it stores "ditto-blocks" so there are always multiple
copies of it about, similar to UFS's superblock being spread all over
the disk.  For most data you generally want some level of ZFS RAID, but
I'm pretty sure you can make it store multiple copies on the same disk
(zfs set copies=2 on a 1-disk ZFS, for example).

In the event of IO errors, I believe some Linux software RAID levels can
perform similar recovery; rewriting the erroring blocks from another
device to force the disk to rewrite the broken block.

> I believe journalling filesystems (e.g.  ext3fs and gjournal) have
> this ability, while Standard UFS, UFS2, NTFS, FAT, and many others do
> not.

No, journalling has nothing to do with this kind of self-healing; a
journal allows a filesystem to be made consistent when interrupted (i.e.
by a crash or power failure) with a very small number of operations
because it has a log of what has or was about to happen.  Journalling
filesystems are just as vulnerable to corruption as non-journalling ones.

NTFS is journalling, BTW.

> > What's the point in having the connection protected by a CRC if it's
> > just going to let bogus data through anyway?
> 
> A CRC (or checksum) acts as a method of differential detection, e.g.
> detect corruption between X and Y.  CRCs are not the same thing as error
> correction or retransmittal; they only result in reporting data
> corruption, and cannot repair it.

Yes, I know what CRC's are; my point is, a CRC error should mean the
corrupt data doesn't make it to the higher layers; ZFS, UFS, gmirror,
whatever, should get an IO error if the data can't be retrieved after
retries fail, they shouldn't get an apparently successful read with
corrupt data in it.

Perhaps this is the case, and (S)ATA's CRC's are just so poor a couple of
retries is enough to get corrupt data which happens to have a correct
CRC.

-- 
Thomas 'Freaky' Hurst
    http://hur.st/