Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Apr 2003 07:30:44 +0100
From:      Ian Dowse <iedowse@maths.tcd.ie>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Kirk McKusick <mckusick@beastie.mckusick.com>
Subject:   Re: PATCH: Forcible delaying of UFS (soft)updates 
Message-ID:  <200304200730.aa34354@salmon.maths.tcd.ie>
In-Reply-To: Your message of "Fri, 18 Apr 2003 11:12:01 PDT." <3EA03FF1.280B6810@mindspring.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <3EA03FF1.280B6810@mindspring.com>, Terry Lambert writes:
>David Schultz wrote:
>> As for the ATA delayed write feature, I don't believe it will
>> guarantee consistency.
>
>It doesn't.  I checked, after voicing my suspions of it.

Yes, write ordering and hence FS consistency is not guaranteed; my
original point was just that the situation regarding FS consistency
with ATA delayed writes is not significantly worse than that with
the default behaviour of having ATA write cacheing enabled. In fact,
if the OS is modified to perform writes in batches then the two
cases are almost identical: in one case the disk collects a batch
of writes, possibly reorders them, and writes them out in one burst;
in the other case the OS sends a burst of writes, the disk possibly
reorders them and writes them out. For reference I've included below
what IBM say about the delayed write feature in their disk
documentation.

BTW, to answer a point Marko mentioned, I don't consider the delayed
write behaviour to be nearly as bad as a null fsync(), because you
are very unlikely to completely lose a file that has been modified,
saved and then fsync()'d. If the write/rename/fsync all happen while
the disk is spun down then the old version of the file is still
intact on the media if the power fails. With a null fsync(), there
can be a considerable window where the disk contains just a zero-length
file.

I completely accept that there is more flexibility at the OS side
to control which writes get delayed and by how much, and that an
OS-side implementation would be extremely useful. However I think
it would require further work to develop a good implementation. For
example, the current proposed patch effectively assumes that there
is only one disk in the system since `stratcalls' is a global
variable (e.g., I believe that reading from an ATA flash device
would trigger a flush to any real ATA disks in the system). It would
also be useful if the solution was not specific to ATA devices and
had per-device control over the behaviour.

I guess my point of view is more that doing this right at the OS
side is hard, and ATA delayed write is an unobtrusive neat feature
that does mostly the right thing at the cost of only a marginal
increase in the risk of data loss for typical uses.

Ian

	11.13 Delayed Write function (vendor specific)

	Delayed Write function is a power saving enhancement whereby
	the device delays the actual data writing into the media.
	When the device is in the power saving mode and the Write
	command (Write Sectors, Write Multiple, or Write DMA) comes
	from the host, the transferred data is not written into the
	media immediately, only stored into the cache buffer. When
	the cache buffer becomes full or reaches the predefined
	size, or if any command except the Write command is issued,
	the operation to write the data from the cache buffer into
	the media is begun.

	Power consumption can be reduced by Delayed Write. When
	Write commands come with a long interval, the device must
	exit from the power saving mode and enter into the power
	saving mode again without Delayed Write function. If Delayed
	Write is enabled, such power saving mode transition times
	can be reduced. As a result, the additional energy for power
	saving mode transition can be saved, then the average power
	consumption of the device can be reduced.

	However, the time elapsed from the completion of the Write
	command to the media write completion will be extended with
	Delayed Write function. If the power for the device is
	turned off during this time, the data which has not been
	written to the media is lost.  Therefore, a command listed
	in the Write Cache Function section shall be issued before
	the power off to confirm whole cached data has been written
	into the media.

	For safety, Delayed Write function is disabled at Power On
	Default.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304200730.aa34354>