From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 23:30:52 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F36AE37B401; Sat, 19 Apr 2003 23:30:51 -0700 (PDT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 1040643FDF; Sat, 19 Apr 2003 23:30:50 -0700 (PDT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 20 Apr 2003 07:30:49 +0100 (BST) To: Terry Lambert In-Reply-To: Your message of "Fri, 18 Apr 2003 11:12:01 PDT." <3EA03FF1.280B6810@mindspring.com> Date: Sun, 20 Apr 2003 07:30:44 +0100 From: Ian Dowse Message-ID: <200304200730.aa34354@salmon.maths.tcd.ie> cc: freebsd-fs@FreeBSD.ORG cc: David Schultz cc: freebsd-stable@FreeBSD.ORG cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2003 06:30:52 -0000 In message <3EA03FF1.280B6810@mindspring.com>, Terry Lambert writes: >David Schultz wrote: >> As for the ATA delayed write feature, I don't believe it will >> guarantee consistency. > >It doesn't. I checked, after voicing my suspions of it. Yes, write ordering and hence FS consistency is not guaranteed; my original point was just that the situation regarding FS consistency with ATA delayed writes is not significantly worse than that with the default behaviour of having ATA write cacheing enabled. In fact, if the OS is modified to perform writes in batches then the two cases are almost identical: in one case the disk collects a batch of writes, possibly reorders them, and writes them out in one burst; in the other case the OS sends a burst of writes, the disk possibly reorders them and writes them out. For reference I've included below what IBM say about the delayed write feature in their disk documentation. BTW, to answer a point Marko mentioned, I don't consider the delayed write behaviour to be nearly as bad as a null fsync(), because you are very unlikely to completely lose a file that has been modified, saved and then fsync()'d. If the write/rename/fsync all happen while the disk is spun down then the old version of the file is still intact on the media if the power fails. With a null fsync(), there can be a considerable window where the disk contains just a zero-length file. I completely accept that there is more flexibility at the OS side to control which writes get delayed and by how much, and that an OS-side implementation would be extremely useful. However I think it would require further work to develop a good implementation. For example, the current proposed patch effectively assumes that there is only one disk in the system since `stratcalls' is a global variable (e.g., I believe that reading from an ATA flash device would trigger a flush to any real ATA disks in the system). It would also be useful if the solution was not specific to ATA devices and had per-device control over the behaviour. I guess my point of view is more that doing this right at the OS side is hard, and ATA delayed write is an unobtrusive neat feature that does mostly the right thing at the cost of only a marginal increase in the risk of data loss for typical uses. Ian 11.13 Delayed Write function (vendor specific) Delayed Write function is a power saving enhancement whereby the device delays the actual data writing into the media. When the device is in the power saving mode and the Write command (Write Sectors, Write Multiple, or Write DMA) comes from the host, the transferred data is not written into the media immediately, only stored into the cache buffer. When the cache buffer becomes full or reaches the predefined size, or if any command except the Write command is issued, the operation to write the data from the cache buffer into the media is begun. Power consumption can be reduced by Delayed Write. When Write commands come with a long interval, the device must exit from the power saving mode and enter into the power saving mode again without Delayed Write function. If Delayed Write is enabled, such power saving mode transition times can be reduced. As a result, the additional energy for power saving mode transition can be saved, then the average power consumption of the device can be reduced. However, the time elapsed from the completion of the Write command to the media write completion will be extended with Delayed Write function. If the power for the device is turned off during this time, the data which has not been written to the media is lost. Therefore, a command listed in the Write Cache Function section shall be issued before the power off to confirm whole cached data has been written into the media. For safety, Delayed Write function is disabled at Power On Default.