Date: Fri, 15 Jul 2005 01:01:12 +0200 From: Matthias Buelow <mkb@incubus.de> To: Lowell Gilbert <freebsd-stable-local@be-well.ilk.org> Cc: freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: dangerous situation with shutdown process Message-ID: <200507142301.j6EN1CmC037942@drjekyll.mkbuelow.net> In-Reply-To: Message from Lowell Gilbert <freebsd-stable-local@be-well.ilk.org> of "14 Jul 2005 18:09:07 EDT." <447jftrqf0.fsf@be-well.ilk.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Lowell Gilbert <freebsd-stable-local@be-well.ilk.org> writes: >Jon Dama <jd@ugcs.caltech.edu> writes: >> however, journaling fairs no better, and request barriers do nothing to >> solve the problem. > >I had assumed that the sequence of operations in a journal would be >idempotent. Is that a reasonable design criterion? [If it is, then >it would make up for the fact that you can't build a reliable >transaction gate. That is, you would just have to go back far enough >that you *know* all of the needed journal is within the range you will >replay. But even then, the journal would need to be on a separate >medium, one that doesn't have the "lying to you about transaction >completion" problem.] No, it needn't. It is sufficient that the journal entries for a block of updates that are to follow are on disk before the updates are made. That's all. This can be achieved by inserting a write barrier request in between the journal writes and the actual data/metadata writes. The block driver will, when it sees the barrier, a) write out all requests in its queue that it got before the barrier, and b) flush the cache so that they will not get intermixed by the drive with the following data writes. What could happen now when the power goes away at an inopportune moment? [Note that I'm only talking about filesystem integrity, not general data loss.] * If power goes away before the journal is written, nothing happens. * If the journal is partially written, and power goes away, it will be partially replayed at boot but the filesystem will be consistent. * If power goes away, when the journal is fully written, but no metadata updates have been performed, they will be performed at boot and everything is as if the full request has completed before power went out. * If power goes away when the journal is fully written, and parts of the metadata updates have been written, those updates will be performed twice (once more at reboot) but that won't matter since these operations are idempotent. The remaining metadata updates are then performed once, at reboot. So where is the need for the journal to be on a seperate medium? The only thing that matters is that no metadata updates will be written before the journal has been written, and flushing the disk cache at a barrier will ensure this. Note that the disk doesn't even have to flush the cache when it receives that command, it only has to ensure that it'll perform all requests before the flush in front of those that come afterwards. >I have no idea what "designed to be used with the write-back cache >enabled" could affect the operating life of the disk. If you disable the write cache, you get a much higher wear&tear due to much more seeking. If I observe a 5x performance degradation when the cache is disabled, for sequential writes (i.e., no cache overwriting effects), I would think that I also have a factor >1 of increased seeking operations in the drive, otherwise the performance degradation cannot be explained. [Besides, the disk gets really loud when the cache is disabled.] mkb.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200507142301.j6EN1CmC037942>