Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2003 02:30:04 -0700
From:      David Schultz <das@freebsd.org>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        arch@freebsd.org
Subject:   Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c
Message-ID:  <20030620093004.GA86924@HAL9000.homeunix.com>
In-Reply-To: <3EF2C67D.65F8A635@mindspring.com>
References:  <20030618112226.GA42606@fling-wing.demos.su> <39081.1055937209@critter.freebsd.dk> <20030619113457.GA80739@HAL9000.homeunix.com> <3EF2969F.4EE7D6D4@mindspring.com> <20030620061010.GA85747@HAL9000.homeunix.com> <3EF2C67D.65F8A635@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 20, 2003, Terry Lambert wrote:
> David Schultz wrote:
> > Yes, and my point was that it's important to maintain the
> > separation, at least implicitly, in any new design.  I think this
> > point was obvious to the people concerned before I even mentioned
> > it, so there's no need to rehash it, but the designers of certain
> > other operating systems seem to have missed it.
> 
> Well, Solaris "reinvented" the seperate VM and buffer cache
> in Solaris 2.8.  8-(.  I wasn't sure what you were recommending
> from what you said.

Let me make it clear that I'm not advocating the Solaris 8
approach.  But it would seem that the FS metadata cache needs to
be insulated from the VM cache better than priority paging can
provide.  Perhaps it would be possible to enforce a sort of
self-tuning version of separate VM and buffer caches, where the
buffer cache has a carefully managed RSS that can scale based on
both FS activity and memory pressure.  That way, I/O-intensive
workloads will not be allowed to suck too many pages away from
user processes and the VM system will be able to better estimate
actual memory pressure.

> > The main problem isn't metastability or the lack of deadlock
> > detection, it's that some workloads reasonably require more
> > dependency tracking than the buffer cache can accomodate.  At
> > present, we can't track more than about 50 directories in the
> > buffer cache.
> 
> I don't know if I buy this directly.  It's probably possible
> to commit an incomplete tree, as long as it's complete from
> the root, at any subtree point.  Doing this, though, you would
> have to switch from isosynchronous to synchronus processing on
> the subtree for the remainder of its duration.  This works,
> because you use the associative property of the tree above to
> replace it with a single edge segment; other orphan subtrees
> of the same tree all have to fall into the same mode.

I don't understand what you're getting at here.  If you don't have
enough space to cache more than 50 dependencies, you lose
performance when your working set exceeds 50 directories, period.
Trying to address this issue by making the softupdates flushing
code smarter is only working around the limitations of the present
buffer cache.

> What was Kirk's answer?

He didn't give me one, aside from advocating backing dependencies
with the VM system.  This issue just came up in passing a while
ago in relation to a pathological case for softupdates that
resulted in an explosion of dependencies that filled up the buffer
cache and caused a deadlock.  ;-) (The problem has since been
hacked around, BTW.)

> But the quoted "50" is the ideal, when all dependent operations
> occur in the same tick, given the current wheel size; all this
> strategy does is up the number (the real number isn't 50, it's
> unfortunately 'size - max_n - 1') by making them occur virtually
> in the same tick, even if they are spread out temporally otherwise.

I think 50 is merely a number that makes softupdates not fill up
the buffer cache and deadlock.  Keep in mind that the dependency
graph could have a large fanout, or it could be a multigraph.
There's no magical association between ~50 directories and the
maximum path length in the graph.  Again, it's the buffer cache
that's the primary problem, not softupdates.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030620093004.GA86924>