From owner-freebsd-fs@FreeBSD.ORG Sun Dec 9 01:17:01 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D29D416A417 for ; Sun, 9 Dec 2007 01:17:01 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 92FCF13C448 for ; Sun, 9 Dec 2007 01:17:01 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB91Go19070069; Sat, 8 Dec 2007 17:16:54 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200712090116.lB91Go19070069@gw.catspoiler.org> Date: Sat, 8 Dec 2007 17:16:50 -0800 (PST) From: Don Lewis To: bg@sics.se In-Reply-To: <20071207143348.17470be3@ibook.sics.se> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Cc: freebsd-fs@FreeBSD.org, des@des.no Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2007 01:17:01 -0000 On 7 Dec, Bjorn Gronvall wrote: > On Fri, 07 Dec 2007 13:48:12 +0100 > Dag-Erling Smørgrav wrote: > > Hi Dag-Erling, > >> Bjorn Gronvall writes: >> > Filesystems in general and UFS with soft updates in particular rely on >> > disks providing accurate response to writes. When write caching is >> > enabled the disk will lie and tell the operating system that the write >> > has completed successfully, in reality the data is only cached in disk >> > RAM. When the power disappears the data will be gone forever. >> >> No. This used to be the case with some cheaper disks which ignored the >> ATA "flush cache" command to score higher on benchmarks, but I doubt >> you'll find any disks on the market that still do that (at least from >> reputable manufacturers). > > Agreed, but the software must also be written to actually make use of > the more recent "flush cache" feature. I know that the GEOM journal > can make use of this feature but does UFS with soft updates use it? UFS with soft updates does not use the "flush cache" feature. it assumes that once the drive says that the data has been written, that the data is actually on the platter. If the drive does write caching, this is an invalid assumption because the drive will indicate that data has been written as soon as it gets transferred to the drive's cache. Disabling write caching fixes this problem, but badly hurts the performance of ATA drives, because it forces each I/O operation to be done sequentually. This is much less of an issue with SCSI drives, because they have tagged command queuing (which is supported by FreeBSD), which allows multiple simultaneous I/O requests to be queued to the drive, which is free to re-order them more optimally, and to report their status in what ever order the operations are completed. Modern SATA drives have something similar, Native Command Queuing (NCQ), but it is not yet supported by FreeBSD. I'm also under the impression that modern ATA drives boost their capacity by always rewriting a full track so that they can eliminate the overhead of sector headers and trailers. This hurts performance when write caching is disabled, because even a single sector write requires the full track to be rewritten, which could require multiple revolutions of the spindle (a full track read if the track has not been cached, a full track write, and possibly a partial revolution to get to the correct location to start the write), and multiple writes to the same track can not be combined. Also, unless the drive can complete the entire track rewrite after it detects power starting to fail, a power failure could corrupt data on the same track as a sector being rewritten. This data might be totally unrelated to the sector(s) being modified and would be expected by the file system to be stable. The checksumming done by ZFS in combination with RAID would help with this, but a power failure could still potentially wipe out all the redundant copies of the data.