From owner-freebsd-alpha Sun Sep 27 15:07:40 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id PAA26265 for freebsd-alpha-outgoing; Sun, 27 Sep 1998 15:07:40 -0700 (PDT) (envelope-from owner-freebsd-alpha@FreeBSD.ORG) Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA26241 for ; Sun, 27 Sep 1998 15:07:34 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id PAA10614; Sun, 27 Sep 1998 15:07:20 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpd010583; Sun Sep 27 15:07:10 1998 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id PAA01294; Sun, 27 Sep 1998 15:07:02 -0700 (MST) From: Terry Lambert Message-Id: <199809272207.PAA01294@usr05.primenet.com> Subject: Re: one other thing... To: gibbs@plutotech.com (Justin T. Gibbs) Date: Sun, 27 Sep 1998 22:07:02 +0000 (GMT) Cc: tlambert@primenet.com, gibbs@plutotech.com, ken@plutotech.com, freebsd-alpha@FreeBSD.ORG, imp@plutotech.com In-Reply-To: <199809272144.PAA06311@pluto.plutotech.com> from "Justin T. Gibbs" at Sep 27, 98 03:37:48 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >Soft update reduces the number of writes to the device. And because > >it does implicit write gathering, there is little or no room for the > >disk to further optimize this under the cover. > > Soft updates deals with meta-data only. If I am a database (your example), > I will likely have very few (if any) files in the filesystem and the only > thing soft updates helps me with is dealing with file growth and > other meta-data related accesses. Soft updates, in fact, does no write > gathering at all. It makes a single meta-data change at a time just > as is done in the sync case, but because of the tracking of dependencies, > can perform these writes without blocking other file system activity. FreeBSD does write gather in the clustering code. But I believe the mtime is not going to be updated in the on disk inode unless the writes have actually taken place (i.e., the _user data_ has actually been "M"'ed). > >A well written OS will be better able to utilize memory in a fashion > >suitable to the OS than some disk drive manufacturer building disks > >with the general expectation of a VFAT32 FS. > > Hmm. Then why does this give us something like a 2X performance boost? A _well written_ OS will be better able to utilize memory... Or maybe you aren't doing tagged command queueing? Really, this is a latency issue, and not a general linear performance win, like you imply with your "2X". You can improve latency by shortening the cycle time, but you can *also* improve latency by increasing concurrency. If the issue here is serial latency of inherently serial operations. My guess is that ordering is *not* guaranteed between seperate tagged commands, and the disk is using the write cache to optimize seek latency. Either that, or you are rewriting overlapping windows: [ ] [ ] [ ] ... With a really strange micro-benchmark designed to test whether the disk does write caching... > >The bottom line: I can make it go as fast as you want, if it doesn't > >have to be correct. Faster even... > > With a UPS, I'm guaranteed that the data will be correct even if the OS > crashes so long as I don't have a concurrent device failure. I have > backups or RAID to deal with device failure and no setting of the > write cache helps you in this case anyway. Right. You didn't say the disk you were doing this to was a member of a volume set in a UPS'ed RAID 5 array, however. > >This helps little. Unless the write is committed tostable storage > >on a device block basis under OS control, there are still race > >windows inherent in the sector order reversal. If the drive > >believes it is about to write a run of contiguous sectors, it > >will *still* reorder the writes. > > The SCSI spec says otherwise. The race windows are based on assumptions by the OS on write ordering dependencies, not SCSI race conditions. Sorry if that wasn't clear. Your previous point about being able to do a cache flush command to the disk is salient; however, FreeBSD is apparently not doing this yet, and until it does, the point about their being a race, stands. > >This protects against crashes as a result of a power loss, but > >not those resulting from a memory overcommit architecture, nor > >those resulting from kernel bugs. > > But a crash by way of a power loss is the only way (barring device failure) > that my write cache can be compromised. If the OS crashes and reboots, > the drive will happily commit to non-volatile storage that the OS assumed > was written. If the OS goes off in the weeds, it can write wrong data. Look at the library timestamp update bug. > >Frankly, we are arguing on different axes; you are discussing > >"safe enough", while I'm discussing "reliable". > > My argument is that write caching is "reliable" if you use a UPS. OK, then we are in violent agreement about the qualification I wanted to make on the original posting. I guess we can stop now, since I only wanted to qualify the instructions with what they implied above and beyond "faster writes". > All Quantum, Seagate, and IBM drives ship with write caching > enabled so there must be thousands of people using FreeBSD that, although > they haven't complained once about this problem, are certain to experience > file system corruption. But the only people who care about correctness are here and on the SCSI list... ;-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message