From owner-freebsd-stable@FreeBSD.ORG Wed Feb 13 10:12:18 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BBF4616A469 for ; Wed, 13 Feb 2008 10:12:18 +0000 (UTC) (envelope-from junics-fbsdstable@atlantis.maniacs.se) Received: from mammoth.unixsh.net (mammoth.unixsh.net [195.35.83.67]) by mx1.freebsd.org (Postfix) with SMTP id 0B03C13C4EA for ; Wed, 13 Feb 2008 10:12:17 +0000 (UTC) (envelope-from junics-fbsdstable@atlantis.maniacs.se) Received: (qmail 88819 invoked from network); 13 Feb 2008 09:45:36 -0000 Received: from localhost.maniacs.se (HELO ?192.168.0.34?) (127.0.0.1) by localhost.maniacs.se with SMTP; 13 Feb 2008 09:45:36 -0000 Message-ID: <47B2BC40.90404@atlantis.maniacs.se> Date: Wed, 13 Feb 2008 10:45:36 +0100 From: junics-fbsdstable@atlantis.maniacs.se User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Joe Peterson References: <47ACD7D4.5050905@skyrush.com> <47ACDE82.1050100@skyrush.com> <20080208173517.rdtobnxqg4g004c4@www.wolves.k12.mo.us> <47ACF0AE.3040802@skyrush.com> <1202747953.27277.7.camel@buffy.york.ac.uk> <47B0A45C.4090909@skyrush.com> In-Reply-To: <47B0A45C.4090909@skyrush.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: Analysis of disk file block with ZFS checksum error X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Feb 2008 10:12:18 -0000 Joe Peterson wrote: *cut* > I suppose the best ZFS could then do is retry the write (if its > failure was even detected - still not sure if ZFS does a re-check of the > disk data checksum after the disk write), not knowing until the later > scrub that the block had corrupted a file. > *cut* Disclaimer: I have only experimented with ZFS in a VM and read much of the documentation, but never used it "properly". Please correct me if i am wrong. 1) If it where able to verify written data directly after a write, then it would probably be an optional feature. I don't recall such an option when I experimented, nor can i find it in the online man pages.... (DOS actually had something like: set verify=on) 2) It would cause a lot of head seeking and killing performance, unless queued into an elevator seek batch job when the disks are idle. (Wikipedia: Elevator_algorithm) 3) It would need to disable all disk read caching to really verify what was written to the surface correctly. Probably a complex problem considering all the different types of hardware out there, also in keeping ZFS portable. 4) ZFS is designed to be run in a redundant configuration, so once it reads the bad block on request or scrub then it would be able to overwrite the bad block from the redundant data. (See details on self healing in the ZFS docs) 4.1) If your ZFS is up to date then you could probably set the copies=2 parameter on the mount point and do a "poor mans raid1", if it is a hardware problem that is... _All_ metadata is already written at least twice, even in a single disk configuration. I think it will try to keep the blocks apart 1/8 of the total space. 4.2) Overwriting bad blocks plays nice with internal disk sector relocation. Pending sectors in smartctl -a is a thing of the past :) I actually have two bad disks that i probably will try it on, once 7.0 is released. They are heat damaged so bad sectors are popping up semi-frequently.