From owner-freebsd-stable@FreeBSD.ORG  Wed Feb 13 10:12:18 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BBF4616A469
	for <freebsd-stable@freebsd.org>; Wed, 13 Feb 2008 10:12:18 +0000 (UTC)
	(envelope-from junics-fbsdstable@atlantis.maniacs.se)
Received: from mammoth.unixsh.net (mammoth.unixsh.net [195.35.83.67])
	by mx1.freebsd.org (Postfix) with SMTP id 0B03C13C4EA
	for <freebsd-stable@freebsd.org>; Wed, 13 Feb 2008 10:12:17 +0000 (UTC)
	(envelope-from junics-fbsdstable@atlantis.maniacs.se)
Received: (qmail 88819 invoked from network); 13 Feb 2008 09:45:36 -0000
Received: from localhost.maniacs.se (HELO ?192.168.0.34?) (127.0.0.1)
	by localhost.maniacs.se with SMTP; 13 Feb 2008 09:45:36 -0000
Message-ID: <47B2BC40.90404@atlantis.maniacs.se>
Date: Wed, 13 Feb 2008 10:45:36 +0100
From: junics-fbsdstable@atlantis.maniacs.se
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
MIME-Version: 1.0
To: Joe Peterson <joe@skyrush.com>
References: <47ACD7D4.5050905@skyrush.com>		<D6B0BBFB-D6DB-4DE1-9094-8EA69710A10C@apple.com>		<47ACDE82.1050100@skyrush.com>		<20080208173517.rdtobnxqg4g004c4@www.wolves.k12.mo.us>		<47ACF0AE.3040802@skyrush.com>	<1202747953.27277.7.camel@buffy.york.ac.uk>
	<47B0A45C.4090909@skyrush.com>
In-Reply-To: <47B0A45C.4090909@skyrush.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org
Subject: Re: Analysis of disk file block with ZFS checksum error
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Feb 2008 10:12:18 -0000

Joe Peterson wrote:

*cut*
> I suppose the best ZFS could then do is retry the write (if its
> failure was even detected - still not sure if ZFS does a re-check of the
> disk data checksum after the disk write), not knowing until the later
> scrub that the block had corrupted a file.
>   
*cut*

Disclaimer: I have only experimented with ZFS in a VM and read much of 
the documentation, but never used it "properly". Please correct me if i 
am wrong.

1) If it where able to verify written data directly after a write, then 
it would probably be an optional feature. I don't recall such an option 
when I experimented, nor can i find it in the online man pages.... (DOS 
actually had something like: set verify=on)
2) It would cause a lot of head seeking and killing performance, unless 
queued into an elevator seek batch job when the disks are idle. 
(Wikipedia: Elevator_algorithm)
3) It would need to disable all disk read caching to really verify what 
was written to the surface correctly. Probably a complex problem 
considering all the different types of hardware out there, also in 
keeping ZFS portable.
4) ZFS is designed to be run in a redundant configuration, so once it 
reads the bad block on request or scrub then it would be able to 
overwrite the bad block from the redundant data. (See details on self 
healing in the ZFS docs)
4.1) If your ZFS is up to date then you could probably set the copies=2 
parameter on the mount point and do a "poor mans raid1", if it is a 
hardware problem that is... _All_ metadata is already written at least 
twice, even in a single disk configuration. I think it will try to keep 
the blocks apart 1/8 of the total space.
4.2) Overwriting bad blocks plays nice with internal disk sector 
relocation. Pending sectors in smartctl -a is a thing of the past :)

I actually have two bad disks that i probably will try it on, once 7.0 
is released. They are heat damaged so bad sectors are popping up 
semi-frequently.