From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 11:20:06 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D585737B401; Sat, 19 Apr 2003 11:20:06 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id F2F6843FAF; Sat, 19 Apr 2003 11:20:05 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196wwS-0000ja-00; Sat, 19 Apr 2003 11:20:01 -0700 Message-ID: <3EA19303.1DB825C8@mindspring.com> Date: Sat, 19 Apr 2003 11:18:43 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182348.58356.zec@tel.fer.hr> <3EA0A647.BEC5931A@mindspring.com> <200304191153.03970.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a46cdfd73c98dcc69db972ae3c87b5828d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 18:20:07 -0000 Marko Zec wrote: > On Saturday 19 April 2003 03:28, Terry Lambert wrote: > > I think that it is important that the value of syncer_delayno > > needs to continue to be incremented once a second, and that the > > modified sched_sync(), which with your patch no longer does this, > > needs to used it's own counter. > > If you look again at the _unpatched_ syncer loop, you will clearly see that > syncer_delayno is not guaranteed to be incremented only once a second. If > speedup_syncer() increases the value of rushjob, the syncer_delayno will be > increased up to rushjob times in a second in the syncer loop, depending if > the buffers can be flushed fast enough. The expedited synching will proceed > until rushjob drops down to 0. > > My patch didn't invent nor did change that model at all. The problem is not the distribution of the entries removed from the wheel, it is the distribution of entries inserted onto the wheel. Running the wheel forward quickly during removal is not a problem. *Not* running the wheel forward _at all_ during insertion *is* a problem. What we care about here is distribution of entries which exist, not distribution of entries which no longer exist. > > In other words, I think that you need to implement a two handed > > clock algorithm, to keep the buckets from getting too deep with > > work items, and in case there is some dependency which is not > > being accounted for that has been working because there is an > > implicit delay of 1 second or more in vn_syncer_add_to_worklist() > > calls (your patch would break this, so without taking this into > > account, we would have to retest all of soft updates). > > Again, please look at the _unmodified_ syncer code. My patch didn't change a > thing regarding the possibility for the syncer to try flushing more than one > syncer_workitem_pending queue in a second. Again, I don't care about flushing for this case: I care about insertion. > > > > Then when you write things out, they write out of order. > > > > > > Uhh.. NO! > > > > Uh, yes; potentially they do. See the implicit dependency > > situation described above. There are other cases, too, but > > they are much more complicated. I wish Kirk would speak up > > in more technical detail about the problems you are potentially > > introducing; they require a deep understanding of the soft > > updates code. > > I wish also... Be aware that I was at least associated with the FreeBSD soft updates code implementation (I did the original "make it compile and link pass", among other things, when Whistle, the company I worked for, paid Kirk to do the implementation), and I was also part of a team which implemented soft updates for FFS in a different environment in 1995. I'm not trying to claim authority here, since I'm one of the sides in this disagreement, but realize I'm not totally clueless when it comes to soft updates. > > No, it has changed. It's changed both in the depth of the > > queue entries, and it's changed in the relative spacing of > > implicit dependencies, and it's changed in the relative depth, > > for two or more dependent operations with future offsets. > > How? By simply stopping the softupdates clock for a couple of seconds (ok, > minutes :) more than usual? Say you stop the clock for 30 seconds: syncer_delayno is not incremented during those 30 seconds. Now, during that time, vn_syncer_add_to_worklist() is called once a second to add workitems. Say they are the same workitems (delay 0, delay 6). Now (relative to the original syncer_delayno), the buckets that are represented by "syncer_workitem_pending[syncer_delayno+delay]" vn_syncer_add_to_worklist() instance | | syncer_workitem_pending[original_syncer_delayno + N] | 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 v 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [your patch:] 30 30 30 [not your patch:] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0 Your patch causes us to get two single buckets 30 deep. Not having your patch gives us 37 buckets; 12 are 1 deep, 18 are 2 deep. Does this make sense now? It is about insertion in the face of a stopped clock, and how bursty the resulting "catchup" will be. If you look at the code, you will see that there is no opportunity for other code to run in a single bucket list traversal, but in the rushjob case of multiple bucket traversals, the system gets control back in between buckets, so the operation of the system is much, much smoother in the case that individual buckets are not allowed to get too deep. This is normally accomplished by incrementing the value of syncer_delayno once per second, as a continuous function, rather than a bursty increment once every 30 seconds. > Is it possible you are confusing the sync_fsync() routine in kern/vfs_subr.c > (which I didn't touch) with the modified fsync() handler in > kern/vfs_syscalls.c. No. I am only talking about the vn_syncer_add_to_worklist() and sched_sync() functions, and how they interact on the syncer_delayno clock. > Can we please either slowly conclude this discussion, or provide a feasible > alternative to the proposed patch? I start feeling like we are wasting > tremendeous amount of time here while going nowhere. Please read the above, specifically the diagram of bucket list depths with a working clock vs. a stopped clock, and the fact that the bucket list traversals are atomic, but multiple bucket traversals of the same number of equally distributed work items are not. I guess I'm willing to provide an alternate patch, if I have to do so, but I would prefer that you understand the issues yourself, since that makes one more person clueful about the issues, in case the few of the rest of us get hit by a bus. 8-). -- Terry