From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 11:20:06 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D585737B401; Sat, 19 Apr 2003 11:20:06 -0700 (PDT)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id F2F6843FAF; Sat, 19 Apr 2003 11:20:05 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196wwS-0000ja-00; Sat, 19 Apr 2003 11:20:01 -0700
Message-ID: <3EA19303.1DB825C8@mindspring.com>
Date: Sat, 19 Apr 2003 11:18:43 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182348.58356.zec@tel.fer.hr>
	<3EA0A647.BEC5931A@mindspring.com> <200304191153.03970.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a46cdfd73c98dcc69db972ae3c87b5828d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 18:20:07 -0000

Marko Zec wrote:
> On Saturday 19 April 2003 03:28, Terry Lambert wrote:
> > I think that it is important that the value of syncer_delayno
> > needs to continue to be incremented once a second, and that the
> > modified sched_sync(), which with your patch no longer does this,
> > needs to used it's own counter.
> 
> If you look again at the _unpatched_ syncer loop, you will clearly see that
> syncer_delayno is not guaranteed to be incremented only once a second. If
> speedup_syncer() increases the value of rushjob, the syncer_delayno will be
> increased up to rushjob times in a second in the syncer loop, depending if
> the buffers can be flushed fast enough. The expedited synching will proceed
> until rushjob drops down to 0.
> 
> My patch didn't invent nor did change that model at all.

The problem is not the distribution of the entries removed from
the wheel, it is the distribution of entries inserted onto the
wheel.

Running the wheel forward quickly during removal is not a problem.

*Not* running the wheel forward _at all_ during insertion *is* a
problem.

What we care about here is distribution of entries which exist,
not distribution of entries which no longer exist.


> > In other words, I think that you need to implement a two handed
> > clock algorithm, to keep the buckets from getting too deep with
> > work items, and in case there is some dependency which is not
> > being accounted for that has been working because there is an
> > implicit delay of 1 second or more in vn_syncer_add_to_worklist()
> > calls (your patch would break this, so without taking this into
> > account, we would have to retest all of soft updates).
> 
> Again, please look at the _unmodified_ syncer code. My patch didn't change a
> thing regarding the possibility for the syncer to try flushing more than one
> syncer_workitem_pending queue in a second.

Again, I don't care about flushing for this case: I care about
insertion.


> > > > Then when you write things out, they write out of order.
> > >
> > > Uhh.. NO!
> >
> > Uh, yes; potentially they do.  See the implicit dependency
> > situation described above.  There are other cases, too, but
> > they are much more complicated.  I wish Kirk would speak up
> > in more technical detail about the problems you are potentially
> > introducing; they require a deep understanding of the soft
> > updates code.
> 
> I wish also...

Be aware that I was at least associated with the FreeBSD soft
updates code implementation (I did the original "make it compile
and link pass", among other things, when Whistle, the company I
worked for, paid Kirk to do the implementation), and I was also
part of a team which implemented soft updates for FFS in a
different environment in 1995.

I'm not trying to claim authority here, since I'm one of the
sides in this disagreement, but realize I'm not totally clueless
when it comes to soft updates.


> > No, it has changed.  It's changed both in the depth of the
> > queue entries, and it's changed in the relative spacing of
> > implicit dependencies, and it's changed in the relative depth,
> > for two or more dependent operations with future offsets.
> 
> How? By simply stopping the softupdates clock for a couple of seconds (ok,
> minutes :) more than usual?

Say you stop the clock for 30 seconds: syncer_delayno is not
incremented during those 30 seconds.  Now, during that time,
vn_syncer_add_to_worklist() is called once a second to add
workitems.  Say they are the same workitems (delay 0, delay 6).
Now (relative to the original syncer_delayno), the buckets that
are represented by "syncer_workitem_pending[syncer_delayno+delay]"

vn_syncer_add_to_worklist() instance
|
|             syncer_workitem_pending[original_syncer_delayno + N]
|  0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
|  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
v
0  1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[your patch:]
30 30          30
[not your patch:]
   1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0

Your patch causes us to get two single buckets 30 deep.

Not having your patch gives us 37 buckets; 12 are 1 deep, 18 are 2
deep.

Does this make sense now?  It is about insertion in the face of a
stopped clock, and how bursty the resulting "catchup" will be.

If you look at the code, you will see that there is no opportunity
for other code to run in a single bucket list traversal, but in the
rushjob case of multiple bucket traversals, the system gets control
back in between buckets, so the operation of the system is much,
much smoother in the case that individual buckets are not allowed
to get too deep.  This is normally accomplished by incrementing the
value of syncer_delayno once per second, as a continuous function,
rather than a bursty increment once every 30 seconds.


> Is it possible you are confusing the sync_fsync() routine in kern/vfs_subr.c
> (which I didn't touch) with the modified fsync() handler in
> kern/vfs_syscalls.c.

No.  I am only talking about the vn_syncer_add_to_worklist() and
sched_sync() functions, and how they interact on the syncer_delayno
clock.


> Can we please either slowly conclude this discussion, or provide a feasible
> alternative to the proposed patch? I start feeling like we are wasting
> tremendeous amount of time here while going nowhere.

Please read the above, specifically the diagram of bucket list
depths with a working clock vs. a stopped clock, and the fact
that the bucket list traversals are atomic, but multiple bucket
traversals of the same number of equally distributed work items
are not.

I guess I'm willing to provide an alternate patch, if I have to
do so, but I would prefer that you understand the issues yourself,
since that makes one more person clueful about the issues, in case
the few of the rest of us get hit by a bus.  8-).

-- Terry