From owner-freebsd-alpha  Sun Sep 27 14:44:41 1998
Return-Path: <owner-freebsd-alpha@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id OAA22733
          for freebsd-alpha-outgoing; Sun, 27 Sep 1998 14:44:41 -0700 (PDT)
          (envelope-from owner-freebsd-alpha@FreeBSD.ORG)
Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA22708
          for <freebsd-alpha@FreeBSD.ORG>; Sun, 27 Sep 1998 14:44:34 -0700 (PDT)
          (envelope-from gibbs@plutotech.com)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130])
	by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id PAA06311;
	Sun, 27 Sep 1998 15:44:19 -0600 (MDT)
Message-Id: <199809272144.PAA06311@pluto.plutotech.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Terry Lambert <tlambert@primenet.com>
cc: gibbs@plutotech.com (Justin T. Gibbs), ken@plutotech.com,
        freebsd-alpha@FreeBSD.ORG, imp@plutotech.com
Subject: Re: one other thing... 
In-reply-to: Your message of "Sun, 27 Sep 1998 19:10:02 -0000."
             <199809271910.MAA25729@usr05.primenet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sun, 27 Sep 1998 15:37:48 -0600
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-alpha@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>> Soft Updates is not a replacement for the on disk cache.  The two
>> serve very different purposes.  One reduces the number of writes
>> to the device, the other reduces the number of writes committed by
>> the device to the disk and reduces latency for any device writes that
>> the OS believes are necessary.
>
>Soft update reduces the number of writes to the device.  And because
>it does implicit write gathering, there is little or no room for the
>disk to further optimize this under the cover.

Soft updates deals with meta-data only.  If I am a database (your example),
I will likely have very few (if any) files in the filesystem and the only
thing soft updates helps me with is dealing with file growth and 
other meta-data related accesses.  Soft updates, in fact, does no write
gathering at all.  It makes a single meta-data change at a time just
as is done in the sync case, but because of the tracking of dependencies,
can perform these writes without blocking other file system activity.

>A well written OS will be better able to utilize memory in a fashion
>suitable to the OS than some disk drive manufacturer building disks
>with the general expectation of a VFAT32 FS.

Hmm.  Then why does this give us something like a 2X performance boost?

>As to the latency argument, yeah, it reduces latency.  So does mounting
>async, and so does a caching controller and so does noatime, and so
>does taking the fsync() calls out of the database's two stage commit
>routine, and... and...
>
>The bottom line: I can make it go as fast as you want, if it doesn't
>have to be correct.  Faster even...

With a UPS, I'm guaranteed that the data will be correct even if the OS
crashes so long as I don't have a concurrent device failure.  I have
backups or RAID to deal with device failure and no setting of the
write cache helps you in this case anyway.

>> >The drive, in doing caching, may reorder these operations, such
>> >that the index is written out, but the new record is not.
>> 
>> This all depends on how you setup the drive.  You can tell it not
>> to re-order writes (FSW bit in the caching control page).
>
>This helps little.  Unless the write is committed tostable storage
>on a device block basis under OS control, there are still race
>windows inherent in the sector order reversal.  If the drive
>believes it is about to write a run of contiguous sectors, it
>will *still* reorder the writes.

The SCSI spec says otherwise.

>The correct way to achieve lower latency is to increase concurrency
>-- but only between unrelated operations.
>
>The appropriate technology for this is multiple outstanding commands;
>tagged command queueing, in other words.

These drives support 63 tagged transactions and we were using them all.

>> If I was really worried, however, I'd have the box on a UPS.
>
>This protects against crashes as a result of a power loss, but
>not those resulting from a memory overcommit architecture, nor
>those resulting from kernel bugs.

But a crash by way of a power loss is the only way (barring device failure)
that my write cache can be compromised.  If the OS crashes and reboots,
the drive will happily commit to non-volatile storage that the OS assumed
was written.

>Frankly, we are arguing on different axes; you are discussing
>"safe enough", while I'm discussing "reliable".

My argument is that write caching is "reliable" if you use a UPS.

>> This is an interface issue, not a cache issue.  If the kernel told the
>> disk driver to sync the cache, it could.  This is what the Synchronize
>> Cache command is all about.
>
>But it doesn't, so you can't turn caching on and maintain data
>integrity guarantees; only data integrity probabilities.

With a UPS, you do have these guarantees as the device has received
the data before a future OS crash.  If you are running an important
data-base, is your machine on a UPS?  It certainly is here.

>> You are assuming that the OS will never panic.  I don't use async
>> mounts because I expect the OS to occasionally crash.  I worry
>> about power outages too, but they are something I can easily control
>> with a UPS.
>
>Actually, I am assuming that the OS *will* panic.  If you discount
>everything including power failures, then you get the first part
>of my "PS:".  If you discount everything *but* power failures,
>e.g., by appeal to a UPS, then you get the second part (since as
>long as the write occurs before a bus reset telling the device to
>forget everything and initialize itself, a cached write will
>still occur -- note that this is a real danger in a panic situation).

What does a panic have to do with this?  It has no effect on the write
cache on the disk, which is the whole thing we are discussing.  If an
application that requires stable storage in order to recover from a
crash was not written properly (to use fsync for checkpoints, etc.) it
will lose write cache or no.

>Because someone suggested doing something that I felt was bad
>advice, and so long as that bad advice was in the record, I
>felt it necessary to note, also for the record, why the advice
>was bad.

You had better go out onto the main lists then and practice your speech
there.  All Quantum, Seagate, and IBM drives ship with write caching
enabled so there must be thousands of people using FreeBSD that, although
they haven't complained once about this problem, are certain to experience
file system corruption.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message