From owner-freebsd-stable@FreeBSD.ORG Sat Sep 27 11:03:31 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC3CF1065698 for ; Sat, 27 Sep 2008 11:03:31 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id 708D68FC1D for ; Sat, 27 Sep 2008 11:03:31 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA14.westchester.pa.mail.comcast.net ([76.96.62.60]) by QMTA08.westchester.pa.mail.comcast.net with comcast id Kn1d1a0081HzFnQ58n3WsU; Sat, 27 Sep 2008 11:03:30 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA14.westchester.pa.mail.comcast.net with comcast id Kn3V1a0064v8bD73an3Vvi; Sat, 27 Sep 2008 11:03:30 +0000 X-Authority-Analysis: v=1.0 c=1 a=TxirYYpeSEAA:10 a=QO6ccaido9wA:10 a=QycZ5dHgAAAA:8 a=dMzvZbSur6M3SCRB3wAA:9 a=ZXk5nPfR0Cx_XzaHtt8A:7 a=7Nwyayq0x_QOiBbQBpnDXByYlnkA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 5250EC9432; Sat, 27 Sep 2008 04:03:29 -0700 (PDT) Date: Sat, 27 Sep 2008 04:03:29 -0700 From: Jeremy Chadwick To: Derek Kuli??ski Message-ID: <20080927110329.GA50142@icarus.home.lan> References: <20080921213426.GA13923@0lsen.net> <20080921215203.GC9494@icarus.home.lan> <20080921215930.GA25826@0lsen.net> <20080921220720.GA9847@icarus.home.lan> <249873145.20080926213341@takeda.tk> <20080927051413.GA42700@icarus.home.lan> <765067435.20080926223557@takeda.tk> <20080927064417.GA43638@icarus.home.lan> <588787159.20080927003750@takeda.tk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <588787159.20080927003750@takeda.tk> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-stable@FreeBSD.org, Clint Olsen Subject: Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Sep 2008 11:03:31 -0000 On Sat, Sep 27, 2008 at 12:37:50AM -0700, Derek Kuli??ski wrote: > Friday, September 26, 2008, 11:44:17 PM, you wrote: > > >> As far as I know (at least ideally, when write caching is disabled) > > > Re: write caching: wheelies and burn-outs in empty parking lots > > detected. > > > Let's be realistic. We're talking about ATA and SATA hard disks, hooked > > up to on-board controllers -- these are the majority of users. Those > > with ATA/SATA RAID controllers (not on-board RAID either; most/all of > > those do not let you disable drive write caching) *might* have a RAID > > BIOS menu item for disabling said feature. > > > FreeBSD atacontrol does not let you toggle such features (although "cap" > > will show you if feature is available and if it's enabled or not). > > > Users using SCSI will most definitely have the ability to disable > > said feature (either via SCSI BIOS or via camcontrol). But the majority > > of users are not using SCSI disks, because the majority of users are not > > going to spend hundreds of dollars on a controller followed by hundreds > > of dollars for a small (~74GB) disk. > > > Regardless of all of this, end-users should, in no way shape or form, > > be expected to go to great lengths to disable their disk's write cache. > > They will not, I can assure you. Thus, we must assume: write caching > > on a disk will be enabled, period. If a filesystem is engineered with > > that fact ignored, then the filesystem is either 1) worthless, or 2) > > serves a very niche purpose and should not be the default filesystem. > > > Do we agree? > > Yes, but... > > In the link you sent to me, someone mentioned that write cache is > always creates problem, and it doesn't matter on OS or filesystem. > > There's more below. > > >> the data should always be consistent, and all fsck supposed to be > >> doing is to free unreferenced blocks that were allocated. > > fsck does a heck of a lot more than that, and there's no guarantee > > that's all fsck is going to do on a UFS2+SU filesystem. I'm under the > > impression it does a lot more than just looking for unref'd blocks. > > Yes, fsck does a lot more than that. But the whole point of soft > updates is to reduce the work of fsck to deallocate allocated blocks. > > Anyway, maybe my information are invalid, though funny thing is that > Soft Updates was mentioned in one of my lecture on Operating Systems. > > Apparently the goal of Soft Updates is to always enforce those rules > in very efficient manner, by reordering the writes: > 1. Never point to a data structure before initializing it > 2. Never reuse a structure before nullifying pointers to it > 3. Never reset last pointer to live structure before setting a new one > 4. Always mark free-block bitmap entries as used before making the > directory entry point to it > > The problem comes with disks which for performance reasons cache the > data and then write it in different order back to the disk. > I think that's the reason why it's recommended to disable it. > If a disk is reordering the writes, it renders the soft updates > useless. > > But if the writing order is preserved, all data remains always > consistent, the only thing that might appear are blocks that were > marked as being used, but nothing was pointing to them yet. > > So (in ideal situation, when nothing interferes) all fsck needs to do > is just to scan the filesystem and deallocate those blocks. > > > The system is already up and the filesystems mounted. If the error in > > question is of such severity that it would impact a user's ability to > > reliably use the filesystem, how do you expect constant screaming on > > the console will help? A user won't know what it means; there is > > already evidence of this happening (re: mysterious ATA DMA errors which > > still cannot be figured out[6]). > > > IMHO, a dirty filesystem should not be mounted until it's been fully > > analysed/scanned by fsck. So again, people are putting faith into > > UFS2+SU despite actual evidence proving that it doesn't handle all > > scenarios. > > Yes, I think the background fsck should be disabled by default, with a > possibility to enable it if the user is sure that nothing will > interfere with soft updates. > > > The problem here is that when it was created, it was sort of an > > "experiment". Now, when someone installs FreeBSD, UFS2 is the default > > filesystem used, and SU are enabled on every filesystem except the root > > fs. Thus, we have now put ourselves into a situation where said > > feature ***must*** be reliable in all cases. > > I think in worst case it just is as realiable as if it wouldn't be > enabled (the only danger is the background fsck) > > > You're also forgetting a huge focus of SU -- snapshots[1]. However, there > > are more than enough facts on the table at this point concluding that > > snapshots are causing more problems[7] than previously expected. And > > there's further evidence filesystem snapshots shouldn't even be used in > > this way[8]. > > there's not much to argue about that. > > >> Also, if I remember correctly, PJD said that gjournal is performing > >> much better with small files, while softupdates is faster with big > >> ones. > > > Okay, so now we want to talk about benchmarks. The benchmarks you're > > talking about are in two places[2][3]. > > > The benchmarks pjd@ provided were very basic/simple, which I feel is > > good, because the tests were realistic (common tasks people will do). > > The benchmarks mckusick@ provided for UFS2+SU were based on SCSI > > disks, which is... interesting to say the least. > > > Bruce Evans responded with some more data[4]. > > > I particularly enjoy this quote in his benchmark: "I never found the > > exact cause of the slower readback ...", followed by (plausible) > > speculations as to why that is. > > > I'm sorry that I sound like such a hard-ass on this matter, but there is > > a glaring fact that people seem to be overlooking intentionally: > > > Filesystems have to be reliable; data integrity is focus #1, and cannot > > be sacrificed. Users and administrators *expect* a filesystem to be > > reliable. No one is going to keep using a filesystem if it has > > disadvantages which can result in data loss or "waste of administrative > > time" (which I believe is what's occurring here). > > > Users *will* switch to another operating system that has filesystems > > which were not engineered/invented with these features in mind. Or, > > they can switch to another filesystem assuming the OS offers one which > > performs equally as good/well and is guaranteed to be reliable -- > > and that's assuming the user wants to spend the time to reformat and > > reinstall just to get that. > > I wasn't trying to argue about that. Perhaps my assumption is wrong, > but I belive that the problems that we know about Soft Updates, at > worst case make system as reliable as it was without using it. > > > In the case of "bit rot" (e.g. drive cache going bad silently, bad > > cables, or other forms of low-level data corruption), a filesystem is > > likely not to be able to cope with this (but see below). > > > A common rebuttal here would be: "so use UFS2 without soft updates". > > Excellent advice! I might consider it myself! But the problem is that > > we cannot expect users to do that. Why? Because the defaults chosen > > during sysinstall are to use SU for all filesystems except root. If SU > > is not reliable (or is "reliable in most cases" -- same thing if you ask > > me), then it should not be enabled by default. I think we (FreeBSD) > > might have been a bit hasty in deciding to choose that as a default. > > > Next: a system locking up (or a kernel panic) should result in a dirty > > filesystem. That filesystem should be *fully recoverable* from that > > kind of error, with no risk of data loss (but see below). > > > (There is the obvious case where a file is written to the disk, and the > > disk has not completed writing the data from its internal cache to the > > disk itself (re: write caching); if power is lost, the disk may not have > > finished writing the cache to disk. In this case, the file is going to > > be sparse -- there is absolutely nothing that can be done about this > > with any filesystem, including ZFS (to my knowledge). This situation > > is acceptable; nature of the beast.) > > > The filesystem should be fully analysed and any errors repaired (either > > with user interaction or automatically -- I'm sure it depends on the > > kind of error) **before** the filesystem is mounted. > > > This is where SU gets in the way. The filesystem is mounted and the > > system is brought up + online 60 seconds before the fsck starts. The > > assumption made is that the errors in question will be fully recoverable > > by an automatic fsck, which as this thread proves, is not always the > > case. > > That's why I think background fsck should be disabled by default. > Though I still don't think that soft updates hurt anything (probably > except performance) > > > ZFS is the first filesystem, to my knowledge, which provides 1) a > > reliable filesystem, 2) detection of filesystem problems in real-time or > > during scrubbing, 3) repair of problems in real-time (assuming raidz1 or > > raidz2 are used), and 4) does not need fsck. This makes ZFS powerful. > > > "So use ZFS!" A good piece of advice -- however, I've already had > > reports from users that they will not consider ZFS for FreeBSD at this > > time. Why? Because ZFS on FreeBSD can panic the system easily due to > > kmem exhaustion. Proper tuning can alleviate this problem, but users do > > not want to to have to "tune" their system to get stability (and I feel > > this is a very legitimate argument). > > > Additionally, FreeBSD doesn't offer ZFS as a filesystem during > > installation. PC-BSD does, AFAIK. So on FreeBSD, you have to go > > through a bunch of rigmarole[5] to get it to work (and doing this > > after-the-fact is a real pain in the rear -- believe me, I did it this > > weekend.) > > > So until both of these ZFS-oriented issues can be dealt with, some > > users aren't considering it. > > > This is the reality of the situation. I don't think what users and > > administrators want is unreasonable; they may be rough demands, but > > that's how things are in this day and age. > > > Have I provided enough evidence? :-) > > Yes, but as far as I understand it's not as bad as you think :) > I could be wrong though. > > I 100% agree on disabling background fsck, but I don't think soft > updates are making the system any less reliable than it would be > without it. With regards to all you've said: Thank you for these insights. Everything you and Erik have said has been quite educational, and I greatly appreciate it. Always good to learn from people who know more! :-) I believe we're in overall agreement with regards to background_fsck (should be disabled by default). I'd file a PR for this sort of thing, but it almost seems like something that should go to the (private) developers list for discussion first. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |