Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Sep 2008 00:37:50 -0700
From:      =?utf-8?Q?Derek_Kuli=C5=84ski?= <takeda@takeda.tk>
To:        Jeremy Chadwick <koitsu@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org, Clint Olsen <clint.olsen@gmail.com>
Subject:   Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
Message-ID:  <588787159.20080927003750@takeda.tk>
In-Reply-To: <20080927064417.GA43638@icarus.home.lan>
References:  <20080921213426.GA13923@0lsen.net> <20080921215203.GC9494@icarus.home.lan> <20080921215930.GA25826@0lsen.net> <20080921220720.GA9847@icarus.home.lan> <249873145.20080926213341@takeda.tk> <20080927051413.GA42700@icarus.home.lan> <765067435.20080926223557@takeda.tk> <20080927064417.GA43638@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello Jeremy,

Friday, September 26, 2008, 11:44:17 PM, you wrote:

>> As far as I know (at least ideally, when write caching is disabled)

> Re: write caching: wheelies and burn-outs in empty parking lots
> detected.

> Let's be realistic.  We're talking about ATA and SATA hard disks, hooked
> up to on-board controllers -- these are the majority of users.  Those
> with ATA/SATA RAID controllers (not on-board RAID either; most/all of
> those do not let you disable drive write caching) *might* have a RAID
> BIOS menu item for disabling said feature.

> FreeBSD atacontrol does not let you toggle such features (although "cap"
> will show you if feature is available and if it's enabled or not).

> Users using SCSI will most definitely have the ability to disable
> said feature (either via SCSI BIOS or via camcontrol).  But the majority
> of users are not using SCSI disks, because the majority of users are not
> going to spend hundreds of dollars on a controller followed by hundreds
> of dollars for a small (~74GB) disk.

> Regardless of all of this, end-users should, in no way shape or form,
> be expected to go to great lengths to disable their disk's write cache.
> They will not, I can assure you.  Thus, we must assume: write caching
> on a disk will be enabled, period.  If a filesystem is engineered with
> that fact ignored, then the filesystem is either 1) worthless, or 2)
> serves a very niche purpose and should not be the default filesystem.

> Do we agree?

Yes, but...

In the link you sent to me, someone mentioned that write cache is
always creates problem, and it doesn't matter on OS or filesystem.

There's more below.

>> the data should always be consistent, and all fsck supposed to be
>> doing is to free unreferenced blocks that were allocated.
> fsck does a heck of a lot more than that, and there's no guarantee
> that's all fsck is going to do on a UFS2+SU filesystem.  I'm under the
> impression it does a lot more than just looking for unref'd blocks.

Yes, fsck does a lot more than that. But the whole point of soft
updates is to reduce the work of fsck to deallocate allocated blocks.

Anyway, maybe my information are invalid, though funny thing is that
Soft Updates was mentioned in one of my lecture on Operating Systems.

Apparently the goal of Soft Updates is to always enforce those rules
in very efficient manner, by reordering the writes:
1. Never point to a data structure before initializing it
2. Never reuse a structure before nullifying pointers to it
3. Never reset last pointer to live structure before setting a new one
4. Always mark free-block bitmap entries as used before making the
   directory entry point to it

The problem comes with disks which for performance reasons cache the
data and then write it in different order back to the disk.
I think that's the reason why it's recommended to disable it.
If a disk is reordering the writes, it renders the soft updates
useless.

But if the writing order is preserved, all data remains always
consistent, the only thing that might appear are blocks that were
marked as being used, but nothing was pointing to them yet.

So (in ideal situation, when nothing interferes) all fsck needs to do
is just to scan the filesystem and deallocate those blocks.

> The system is already up and the filesystems mounted.  If the error in
> question is of such severity that it would impact a user's ability to
> reliably use the filesystem, how do you expect constant screaming on
> the console will help?  A user won't know what it means; there is
> already evidence of this happening (re: mysterious ATA DMA errors which
> still cannot be figured out[6]).

> IMHO, a dirty filesystem should not be mounted until it's been fully
> analysed/scanned by fsck.  So again, people are putting faith into
> UFS2+SU despite actual evidence proving that it doesn't handle all
> scenarios.

Yes, I think the background fsck should be disabled by default, with a
possibility to enable it if the user is sure that nothing will
interfere with soft updates.

> The problem here is that when it was created, it was sort of an
> "experiment".  Now, when someone installs FreeBSD, UFS2 is the default
> filesystem used, and SU are enabled on every filesystem except the root
> fs.  Thus, we have now put ourselves into a situation where said
> feature ***must*** be reliable in all cases.

I think in worst case it just is as realiable as if it wouldn't be
enabled (the only danger is the background fsck)

> You're also forgetting a huge focus of SU -- snapshots[1].  However, there
> are more than enough facts on the table at this point concluding that
> snapshots are causing more problems[7] than previously expected.  And
> there's further evidence filesystem snapshots shouldn't even be used in
> this way[8].

there's not much to argue about that.

>> Also, if I remember correctly, PJD said that gjournal is performing
>> much better with small files, while softupdates is faster with big
>> ones.

> Okay, so now we want to talk about benchmarks.  The benchmarks you're
> talking about are in two places[2][3].

> The benchmarks pjd@ provided were very basic/simple, which I feel is
> good, because the tests were realistic (common tasks people will do).
> The benchmarks mckusick@ provided for UFS2+SU were based on SCSI
> disks, which is... interesting to say the least.

> Bruce Evans responded with some more data[4].

> I particularly enjoy this quote in his benchmark: "I never found the
> exact cause of the slower readback ...", followed by (plausible)
> speculations as to why that is.

> I'm sorry that I sound like such a hard-ass on this matter, but there is
> a glaring fact that people seem to be overlooking intentionally:

> Filesystems have to be reliable; data integrity is focus #1, and cannot
> be sacrificed.  Users and administrators *expect* a filesystem to be
> reliable.  No one is going to keep using a filesystem if it has
> disadvantages which can result in data loss or "waste of administrative
> time" (which I believe is what's occurring here).

> Users *will* switch to another operating system that has filesystems
> which were not engineered/invented with these features in mind.  Or,
> they can switch to another filesystem assuming the OS offers one which
> performs equally as good/well and is guaranteed to be reliable --
> and that's assuming the user wants to spend the time to reformat and
> reinstall just to get that.

I wasn't trying to argue about that. Perhaps my assumption is wrong,
but I belive that the problems that we know about Soft Updates, at
worst case make system as reliable as it was without using it.

> In the case of "bit rot" (e.g. drive cache going bad silently, bad
> cables, or other forms of low-level data corruption), a filesystem is
> likely not to be able to cope with this (but see below).

> A common rebuttal here would be: "so use UFS2 without soft updates".
> Excellent advice!  I might consider it myself!  But the problem is that
> we cannot expect users to do that.  Why?  Because the defaults chosen
> during sysinstall are to use SU for all filesystems except root.  If SU
> is not reliable (or is "reliable in most cases" -- same thing if you ask
> me), then it should not be enabled by default.  I think we (FreeBSD)
> might have been a bit hasty in deciding to choose that as a default.

> Next: a system locking up (or a kernel panic) should result in a dirty
> filesystem.  That filesystem should be *fully recoverable* from that
> kind of error, with no risk of data loss (but see below).

> (There is the obvious case where a file is written to the disk, and the
> disk has not completed writing the data from its internal cache to the
> disk itself (re: write caching); if power is lost, the disk may not have
> finished writing the cache to disk.  In this case, the file is going to
> be sparse -- there is absolutely nothing that can be done about this
> with any filesystem, including ZFS (to my knowledge).  This situation
> is acceptable; nature of the beast.)

> The filesystem should be fully analysed and any errors repaired (either
> with user interaction or automatically -- I'm sure it depends on the
> kind of error) **before** the filesystem is mounted.

> This is where SU gets in the way.  The filesystem is mounted and the
> system is brought up + online 60 seconds before the fsck starts.  The
> assumption made is that the errors in question will be fully recoverable
> by an automatic fsck, which as this thread proves, is not always the
> case.

That's why I think background fsck should be disabled by default.
Though I still don't think that soft updates hurt anything (probably
except performance)

> ZFS is the first filesystem, to my knowledge, which provides 1) a
> reliable filesystem, 2) detection of filesystem problems in real-time or
> during scrubbing, 3) repair of problems in real-time (assuming raidz1 or
> raidz2 are used), and 4) does not need fsck.  This makes ZFS powerful.

> "So use ZFS!"  A good piece of advice -- however, I've already had
> reports from users that they will not consider ZFS for FreeBSD at this
> time.  Why?  Because ZFS on FreeBSD can panic the system easily due to
> kmem exhaustion.  Proper tuning can alleviate this problem, but users do
> not want to to have to "tune" their system to get stability (and I feel
> this is a very legitimate argument).

> Additionally, FreeBSD doesn't offer ZFS as a filesystem during
> installation.  PC-BSD does, AFAIK.  So on FreeBSD, you have to go
> through a bunch of rigmarole[5] to get it to work (and doing this
> after-the-fact is a real pain in the rear -- believe me, I did it this
> weekend.)

> So until both of these ZFS-oriented issues can be dealt with, some
> users aren't considering it.

> This is the reality of the situation.  I don't think what users and
> administrators want is unreasonable; they may be rough demands, but
> that's how things are in this day and age.

> Have I provided enough evidence?  :-)

Yes, but as far as I understand it's not as bad as you think :)
I could be wrong though.

I 100% agree on disabling background fsck, but I don't think soft
updates are making the system any less reliable than it would be
without it.

Also, I'll have to play with ZFS some day :)

-- 
Best regards,
 Derek                            mailto:takeda@takeda.tk

It's a little-known fact that the Y1K problem caused the Dark Ages.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?588787159.20080927003750>