Date: Sat, 19 Apr 2003 17:27:56 -0500 From: Chris Pressey <cpressey@catseye.mb.ca> To: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates Message-ID: <20030419172756.17aaf627.cpressey@catseye.mb.ca> In-Reply-To: <3EA1B72D.B8B96268@mindspring.com> References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr> <3EA19303.1DB825C8@mindspring.com> <200304192134.51484.zec@tel.fer.hr> <3EA1B72D.B8B96268@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 19 Apr 2003 13:53:01 -0700 Terry Lambert <tlambert2@mindspring.com> wrote: > Marko Zec wrote: > > > If you look at the code, you will see that there is no opportunity > > > for other code to run in a single bucket list traversal, but in > > > the rushjob case of multiple bucket traversals, the system gets > > > control back in between buckets, so the operation of the system is > > > much, much smoother in the case that individual buckets are not > > > allowed to get too deep. This is normally accomplished by > > > incrementing the value of syncer_delayno once per second, as a > > > continuous function, rather than a bursty increment once every 30 > > > seconds. > > > > I completely agree with you that smoothness will be sacrificed, but > > again, please do have in mind the original purpose of the patch. > > When running on battery power, smoothness is a bad thing. When > > running on AC, the patch will become inactive, so 100% normal > > operation is automatically restored, and you get all the smoothness > > back. > > You are still missing the point. > [...] > Add to this that the batches of I/O are unlikely to be on the > same track, and therefore there's seek latency as well, and you > have a significant freeze that's going to appear like the machine > is locked up. > > I guess if you are willing to monitor the mailing lists and explain > why this isn't a bad thing every time users complain about it, it's > no big deal, ecept to people who want the feature, but don't agree > with your implementation. 8-). A simple comment to this effect (like 'all writes are delayed, so they go out to disk in a burst which can suspend the machine for a long time') next to the option in question would serve the same purpose. If there's a critical problem with this patch, it's not one of inconvenience to the user - they know what they're doing. > > > Please read the above, specifically the diagram of bucket list > > > depths with a working clock vs. a stopped clock, and the fact > > > that the bucket list traversals are atomic, but multiple bucket > > > traversals of the same number of equally distributed work items > > > are not. > > > > True. But this still doesn't justify your claims from previous posts > > that the patched system is likely to corrupt data or crash the > > system. > > The previous claim for potential panic was based on the fact > that the same bucket was being used for the next I/O, rather > than the same + 1 bucket, which is what the code assumed. I > just took it for granted that the failure case was self-evident. > > You need to read the comment in the sched_sync() code, and > understand why it is saying what it is saying: > > /* > * Note: VFS vnodes can remain on the > * worklist too with no dirty blocks, > but* since sync_fsync() moves it to a > different* slot we are safe. > */ > > Your changes makes it so the insertion *does not* put it in a > different slot (because the fsync is most likely delayed). > Therefore we are *not* safe. Reading the comment is easy; understanding why it says what it does is very difficult. Partly because this is a difficult subject, and partly because that this comment, like most comments, is a really lousy comment. Does it mean to imply that if a VFS node with no dirty blocks is not moved to a different slot, that it is unsafe? Or that it is simply safer to move a VFS node with no dirty blocks into a different slot, than to not do it? Either way, what exactly is the danger? I've been trying to follow this thread hoping to learn something. From a layprogrammer's perspective, i.e. someone who's never even looked at the softupdate code or cared overmuch for the details of low-level filesystem programming, here's what I've gathered: Isn't the danger, basically, that a computer running FreeBSD with this patch could potentially: - accumulate a LOT of future writes to be done all at once - exhaust its resources when actually synchronizing those writes - panic yes or no? > [...] > > and from my experience with a production system > > running all the time on a patched kernel. > > This is totally irrelevent; it's anecdotal, and therefore has > nothing whatsoever to do with provable correctness. > > "From my experience" is the same argument that Linux used to > justify async mounts in ext2fs, and they were provably wrong. I disagree strongly that field tests are totally irrelevant. I also doubt that provable correctness has much bearing on what appears to be a resource issue - all you can show is that bursty, bunched-up syncs use up more resources than smooth, spaced-out syncs. Softupdates' behaviour isn't "correct", it's just optimized well. Terry: try to keep some perspective. Marko: clearly mark the patch 'experimental'. -Chris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030419172756.17aaf627.cpressey>