Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Apr 2003 19:12:12 +0200
From:      Marko Zec <zec@tel.fer.hr>
To:        Ian Dowse <iedowse@maths.tcd.ie>, Terry Lambert <tlambert2@mindspring.com>
Cc:        Kirk McKusick <mckusick@beastie.mckusick.com>
Subject:   Re: PATCH: Forcible delaying of UFS (soft)updates
Message-ID:  <200304231912.12333.zec@tel.fer.hr>
In-Reply-To: <200304200730.aa34354@salmon.maths.tcd.ie>
References:  <200304200730.aa34354@salmon.maths.tcd.ie>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday 20 April 2003 08:30, Ian Dowse wrote:
> In message <3EA03FF1.280B6810@mindspring.com>, Terry Lambert writes:
> >David Schultz wrote:
> >> As for the ATA delayed write feature, I don't believe it will
> >> guarantee consistency.
> >
> >It doesn't.  I checked, after voicing my suspions of it.
>
> Yes, write ordering and hence FS consistency is not guaranteed; my
> original point was just that the situation regarding FS consistency
> with ATA delayed writes is not significantly worse than that with
> the default behaviour of having ATA write cacheing enabled. In fact,
> if the OS is modified to perform writes in batches then the two
> cases are almost identical: in one case the disk collects a batch
> of writes, possibly reorders them, and writes them out in one burst;
> in the other case the OS sends a burst of writes, the disk possibly
> reorders them and writes them out. For reference I've included below
> what IBM say about the delayed write feature in their disk
> documentation.
>
> BTW, to answer a point Marko mentioned, I don't consider the delayed
> write behaviour to be nearly as bad as a null fsync(), because you
> are very unlikely to completely lose a file that has been modified,
> saved and then fsync()'d. If the write/rename/fsync all happen while
> the disk is spun down then the old version of the file is still
> intact on the media if the power fails. With a null fsync(), there
> can be a considerable window where the disk contains just a zero-length
> file.
>
> I completely accept that there is more flexibility at the OS side
> to control which writes get delayed and by how much, and that an
> OS-side implementation would be extremely useful. However I think
> it would require further work to develop a good implementation. For
> example, the current proposed patch effectively assumes that there
> is only one disk in the system since `stratcalls' is a global
> variable (e.g., I believe that reading from an ATA flash device
> would trigger a flush to any real ATA disks in the system). It would
> also be useful if the solution was not specific to ATA devices and
> had per-device control over the behaviour.
>
> I guess my point of view is more that doing this right at the OS
> side is hard, and ATA delayed write is an unobtrusive neat feature
> that does mostly the right thing at the cost of only a marginal
> increase in the risk of data loss for typical uses.

Despite me being in favor of OS controlled delayed synching from the moment I 
posted my initial patch, the more I think now of the advantages of the ATA 
firmware controlled delayed writing approach, the more I like it. Still, 
after all the discussions I do not want to claim that OS controlled model now 
become ultimately bad. I simply have to agree with Ian that in order to 
improve the quality of the original patch from the proof-of-concept level to 
the production quality for broad range of hardware configurations and 
application scenarios, the patch should be extended to polute many more 
chunks of code scattered all around the source tree. In contrast to that, the 
ATA controlled delaying approach limits the changes only to the ATA driver, 
while accomplishing nearly the same or completely equivalent functionality.

The only thing that worries me regarding the ATA firmware controlled delaying 
approach is the moment of the system shutdown. If the admin forgets to 
disable write delaying, will the firmware force flushing of cached dirty 
sectors in RAM to the disk before poweroff occurs?
Cheers,

Marko



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304231912.12333.zec>