Date: Sat, 05 May 2001 21:24:07 +0100 From: Nick Barnes <Nick.Barnes@pobox.com> To: Matt Dillon <dillon@earth.backplane.com> Cc: Doug Russell <drussell@saturn-tech.com>, freebsd-stable <freebsd-stable@FreeBSD.ORG> Subject: Re: soft update should be default Message-ID: <5967.989094247@thrush.ravenbrook.com> In-Reply-To: Message from Matt Dillon <dillon@earth.backplane.com> of "Sat, 05 May 2001 11:29:39 PDT." <200105051829.f45ITdC49030@earth.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This sounds as if there isn't _any_ way for the kernel (or, better, an application) to make sure that its bits have got written. Is that really true? Shouldn't the man pages for fsync(1), fsync(2), and sync(8) reflect this? sync(2) has something under "BUGS".... If this is true, it's not good. Presumably fsync(2) will get the data down the cable to the disk unit. If the CPU, kernel, etc goes toes-up a microsecond later, will my bits still hit the platter? I'm assuming I can keep the power on, which is a separate and well-understood problem. But if there's a panic and reboot, presumably there's some kind of "reset now" message sent to the disk unit (the exact details no doubt depend on the disk type). Will it write my bits or flush them? How do different disk units compare in this respect? When I started with FreeBSD, the general understanding was that people who cared about data integrity used SCSI, people who really cared used RAID on SCSI, and people who were fanatical about it used hardware SCSI-to-SCSI RAID in a separate rack unit with redundant PSUs and controllers and very high-quality cables. Is this still the received wisdom? Nick B At 2001-05-05 18:29:39+0000, Matt Dillon writes: > > :> Of course as Gordon writes above, all bets are off if your disk does > :> write-caching. > : > :I still don't totally understand this. In the case of a drive with WCE, > :aren't we always assuming that the drive will correctly write the data out > :eventually, even if the system crashes? > : > :This assumes that we aren't talking about a power failure, here, but if it > :is an external drive array with dual power supplies, at least one battery > :backed, it doesn't matter even if the compuer power is cut, the drive > :should still eventually flush out it's cache, shouldn't it? > : > :(Ideal world, of course, I know.... What if the SCSI bus wedges a drive?) > : > :> There is an excellent paper entitled, Soft Updates: A Solution to the > :> Metadata Update Problem in File Systems, by Gergory R. Granger and Yale > :> N. Patt at EECS, University of Michigan. The paper is at > :> http://www.ece.cmu.edu/~ganger/papers/CSE-TR-254-95/. > : > :Sounds interesting... I'm going to have to go take a look.... > : > :Later....... <Doug> > > Not only will the hard drive not be able to write the write-cached > data to the media, but IDE hard drives will not guarentee write > ordering either. Someone did a test a while back and found that > under heavy disk loads an IDE drive could hold some of the dirty data > in its cache for an indefinite period of time without writing it out. > i.e. it would write out some of the dirty data but also hold some of it > indefinitely, unwritten. > > So turning on WCE is playing with fire. WCE was mangled because > the drive manufacturers were more interested in posting high transfer > rate numbers for benchmarks then in keeping people's data safe. I > remember when it happened... the day the drive manufacturers started > using their lobotomized SCSI cores internally was the day they realized > they could cache writes. > > Now, there is an IDE flush command. Theoretically it would be possible > to write out non-conflicting sectors with WCE turned on, then flush > the cache, and repeat. Theoretically it would be possible for a > RAID system with its own battery backed cache to operate with WCE > turned on and flush the disks before the data would be lost from > its own cache. > > Realistically, drive manufacturers rarely test the command set the > drives are supposed to support beyond making sure it works with some > idiotic windows driver, so these cool commands are as likely to crash > the drive then to work as advertised. > > -Matt > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5967.989094247>