Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Sep 2001 23:42:25 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        David Greenman <dg@root.com>
Cc:        Poul-Henning Kamp <phk@critter.freebsd.dk>, Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, bright@wintelcom.net, hackers@FreeBSD.ORG
Subject:   Conclusions on... was Re: More on the cache_purgeleafdirs() routine
Message-ID:  <200109230642.f8N6gPj84955@earth.backplane.com>
References:  <88901.1001191415@critter> <200109222121.f8MLLHe82202@earth.backplane.com> <20010922141934.C28469@nexus.root.com>

next in thread | previous in thread | raw e-mail | index | archive | help
    Well, this has turned into a rather sticky little problem.  I've
    spent all day going through the vnode/name-cache reclaim code, looking
    both at Seigo's cache_purgeleafdirs() and my own patch.

    This is what is going on:  The old code refused to reuse any vnode that
    had (A) Cached VM Pages associated with it *AND* (B) refused to reuse
    directory vnodes residing in the namei cache that contained any 
    subdirectory or file.  (B) does not apply to file vnodes since they
    obviously cannot have subdirectories or files 'under' them in the namei
    cache.  The problem is that when you take the union of (A) and (B),
    just about every directory vnode in the system winds up being immunue
    from reclamation.  Thus directory vnodes appear to grow forever... so
    it isn't just the fact that most directories are small that is the
    problem, it's the presence of (B).  This is why small files don't cause
    the same problem (or at least do not cause it to the same degree).

    Both Seigo's cache_purgeleafdirs() and my simpler patch simply remove
    the (B) requirement, making directory reclamation work approximately
    the same as file reclamation.  The only difference between Seigo's
    patch and mine is that Seigo's makes an effort to remove directories
    intelligently... it tries to avoid removing higher level directories.
    My patch doesn't make a distinction but assumes that (A) will tend to
    hold for higher level directories: that is, that higher level directories
    tend to be accessed more often and thus will tend to have pages in the 
    VM Page Cache, and thus not be candidates for reuse anyway.  So my patch
    has a very similar effect but without the overhead.

    In all the testing I've done I cannot perceive any performance difference
    between Seigo's patch and mine, but from an algorithmic point of view
    mine ought to scale much, much better.   Even if we adjust 
    cache_purgeleafdirs() to run even less often, we still run up against
    the fact that the scanning algorithm is O(N*M) and we know from history
    that this can create serious breakage.

    People may recall that we had similar problems with the VM Pageout 
    daemon, where under certain load conditions the pageout daemon wound
    up running continuously, eating enormous amounts of cpu.  We lived with
    the problem for years because the scaling issues didn't rear their
    heads until machines got hefty enough to have enough pages for the
    algorithms to break down.

    People may also recall that we had similar problems with the buffer
    cache code.... specifically, the scan 'restart' conditions could
    break down algorithmically and result in massive cpu use by bufdaemon.

    I think cache_purgeleafdirs() had the right idea.  From my experience
    with the VM system, however, I have to recommend that we remove it
    from the system and, at least initially, replace it with my simpler
    patch.  We could extend my patch to do the same check -- that is, only
    remove directory vnodes at lower levels in the namei cache, simply
    by scanning the namei cache list at the vnode in question.  So in fact
    it would be possible to adjust my patch to have the same effect that
    cache_purgeleafdirs() had, but without the scaling issue (or at least
    with less of an issue.. it would be O(N) rather then O(M*N)).

    -

    The bigger problem is exactly as DG has stated... it isn't the namei
    cache that is our enemy, it's the VM Page cache preventing vnodes
    from being recycled.

    For the moment I believe that cache_purgeleafdirs() or my patch solves
    the problem well enough that we can run with it for a while.  The real
    solution, I believe, is to give us the ability to take cached VM Pages
    associated with a file and rename them to cached VM Pages associated
    with the filesystem device - we can do this only for page-aligned
    blocks of course, not fragments (which we would simply throw away)...
    but it would allow us to reclaim vnodes independant of the VM Page cache
    without losing the cached pages.  I think this is doable but it will
    require a considerable amount of work.  It isn't something I can do in a
    day.  I also believe that this can dovetail quite nicely into the I/O
    model that we have slowly been moving towards over the last year
    (Poul's work).  Inevitably we will have to manage device-based I/O
    on a page-by-page basis and being able to do it via a VM Object seems
    to fit the bill in my opinion.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200109230642.f8N6gPj84955>