Date: Tue, 30 Nov 2010 17:22:02 -0500 From: David Schultz <das@FreeBSD.ORG> To: perryh@pluto.rain.com Cc: freebsd-hackers@FreeBSD.ORG, ivoras@FreeBSD.ORG Subject: Re: fsync(2) manual and hdd write caching Message-ID: <20101130222202.GA79001@zim.MIT.EDU> In-Reply-To: <4cc92df1.Z0CRaJOCdvd/ZJSL%perryh@pluto.rain.com> References: <20101026213618.GA3013@freebsd.org> <ia7nln$piv$1@dough.gmane.org> <4cc7ea44.ApOaxS8Xr4Sxu%2B0x%perryh@pluto.rain.com> <20101027111124.00007450@unknown> <ia8u9l$46j$1@dough.gmane.org> <4cc92df1.Z0CRaJOCdvd/ZJSL%perryh@pluto.rain.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 28, 2010, perryh@pluto.rain.com wrote: > Ivan Voras <ivoras@freebsd.org> wrote: > > > ... The problem is actually pretty hard - since AFAIK SoftUpdates > > doesn't have "checkpoints" in the sense that it groups writes and > > all data "before" can guaranteed to be on-disk, the problem is > > *when* to issue BIO_FLUSH requests. > > Seems to me the originally-stated problem -- making fsync(2) > do what it claims to do -- is not hard at all. Just issue a > BIO_FLUSH request as the final step in handling fsync(2). Yes, for correctness, fsync(2) needs to flush the relevant parts of the disk's volatile write cache before returning. If it doesn't, applications like databases can fail if there is a power loss. Unfortunately, this isn't really practical. First, performance is poor: you generally can't flush a particular sector without flushing the entire write cache, and many disks (including all ATA disks) don't differentiate between volatile and non-volatile caches. Second, many disks ignore the command. So the status quo for all the major Unix variants is apparently to favor performance over correctness. However, FlushFileBuffers() in Windows does the right thing and flushes the disk write cache, and I've heard that ZFS and ext4 also do the right thing (subject to the correctness of the disk controller, of course). So FreeBSD isn't any worse than most of the world here. FreeBSD used to turn off disk write caches by default, but many people complained about FreeBSD being slow. Far fewer people complain about corruptions due to power failure. Usually people who require stronger reliability guarantees invest in replicated storage and battery backups anyway. Note that the "broken" behavior is still protective against kernel and application crashes -- just not power failures and certain types of disk faults. An informative article on the topic is here: http://www.postgresql.org/docs/9.0/static/wal-reliability.html > While we're at it, perhaps do the same in close(2). > I _hope_ we are already doing it in unmount(2). close(2) is a different beast; flushes would be too expensive, and they aren't needed except for NFS. Apps are expected to use fsync(2) if they require it.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101130222202.GA79001>