FreeBSD Mail Archives

Date:      Sun, 27 Sep 1998 22:07:02 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        gibbs@plutotech.com (Justin T. Gibbs)
Cc:        tlambert@primenet.com, gibbs@plutotech.com, ken@plutotech.com, freebsd-alpha@FreeBSD.ORG, imp@plutotech.com
Subject:   Re: one other thing...
Message-ID:  <199809272207.PAA01294@usr05.primenet.com>
In-Reply-To: <199809272144.PAA06311@pluto.plutotech.com> from "Justin T. Gibbs" at Sep 27, 98 03:37:48 pm

> >Soft update reduces the number of writes to the device.  And because
> >it does implicit write gathering, there is little or no room for the
> >disk to further optimize this under the cover.
> 
> Soft updates deals with meta-data only.  If I am a database (your example),
> I will likely have very few (if any) files in the filesystem and the only
> thing soft updates helps me with is dealing with file growth and 
> other meta-data related accesses.  Soft updates, in fact, does no write
> gathering at all.  It makes a single meta-data change at a time just
> as is done in the sync case, but because of the tracking of dependencies,
> can perform these writes without blocking other file system activity.

FreeBSD does write gather in the clustering code.

But I believe the mtime is not going to be updated in the on disk
inode unless the writes have actually taken place (i.e., the
_user data_ has actually been "M"'ed).


> >A well written OS will be better able to utilize memory in a fashion
> >suitable to the OS than some disk drive manufacturer building disks
> >with the general expectation of a VFAT32 FS.
> 
> Hmm.  Then why does this give us something like a 2X performance boost?

A _well written_ OS will be better able to utilize memory...

Or maybe you aren't doing tagged command queueing?

Really, this is a latency issue, and not a general linear
performance win, like you imply with your "2X".

You can improve latency by shortening the cycle time, but you can
*also* improve latency by increasing concurrency.

If the issue here is serial latency of inherently serial
operations.

My guess is that ordering is *not* guaranteed between seperate
tagged commands, and the disk is using the write cache to
optimize seek latency.

Either that, or you are rewriting overlapping windows:

[             ]
 [             ]
  [             ]
   ...

With a really strange micro-benchmark designed to test whether the
disk does write caching...


> >The bottom line: I can make it go as fast as you want, if it doesn't
> >have to be correct.  Faster even...
> 
> With a UPS, I'm guaranteed that the data will be correct even if the OS
> crashes so long as I don't have a concurrent device failure.  I have
> backups or RAID to deal with device failure and no setting of the
> write cache helps you in this case anyway.

Right.  You didn't say the disk you were doing this to was a
member of a volume set in a UPS'ed RAID 5 array, however.


> >This helps little.  Unless the write is committed tostable storage
> >on a device block basis under OS control, there are still race
> >windows inherent in the sector order reversal.  If the drive
> >believes it is about to write a run of contiguous sectors, it
> >will *still* reorder the writes.
> 
> The SCSI spec says otherwise.

The race windows are based on assumptions by the OS on write ordering
dependencies, not SCSI race conditions.  Sorry if that wasn't clear.

Your previous point about being able to do a cache flush command to
the disk is salient; however, FreeBSD is apparently not doing this
yet, and until it does, the point about their being a race, stands.


> >This protects against crashes as a result of a power loss, but
> >not those resulting from a memory overcommit architecture, nor
> >those resulting from kernel bugs.
> 
> But a crash by way of a power loss is the only way (barring device failure)
> that my write cache can be compromised.  If the OS crashes and reboots,
> the drive will happily commit to non-volatile storage that the OS assumed
> was written.

If the OS goes off in the weeds, it can write wrong data.  Look at
the library timestamp update bug.


> >Frankly, we are arguing on different axes; you are discussing
> >"safe enough", while I'm discussing "reliable".
> 
> My argument is that write caching is "reliable" if you use a UPS.

OK, then we are in violent agreement about the qualification I
wanted to make on the original posting.  I guess we can stop now,
since I only wanted to qualify the instructions with what they
implied above and beyond "faster writes".


> All Quantum, Seagate, and IBM drives ship with write caching
> enabled so there must be thousands of people using FreeBSD that, although
> they haven't complained once about this problem, are certain to experience
> file system corruption.

But the only people who care about correctness are here and on the
SCSI list... ;-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809272207.PAA01294>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation