Date: Sat, 22 Sep 2001 23:42:25 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: David Greenman <dg@root.com> Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>, Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, bright@wintelcom.net, hackers@FreeBSD.ORG Subject: Conclusions on... was Re: More on the cache_purgeleafdirs() routine Message-ID: <200109230642.f8N6gPj84955@earth.backplane.com> References: <88901.1001191415@critter> <200109222121.f8MLLHe82202@earth.backplane.com> <20010922141934.C28469@nexus.root.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Well, this has turned into a rather sticky little problem. I've spent all day going through the vnode/name-cache reclaim code, looking both at Seigo's cache_purgeleafdirs() and my own patch. This is what is going on: The old code refused to reuse any vnode that had (A) Cached VM Pages associated with it *AND* (B) refused to reuse directory vnodes residing in the namei cache that contained any subdirectory or file. (B) does not apply to file vnodes since they obviously cannot have subdirectories or files 'under' them in the namei cache. The problem is that when you take the union of (A) and (B), just about every directory vnode in the system winds up being immunue from reclamation. Thus directory vnodes appear to grow forever... so it isn't just the fact that most directories are small that is the problem, it's the presence of (B). This is why small files don't cause the same problem (or at least do not cause it to the same degree). Both Seigo's cache_purgeleafdirs() and my simpler patch simply remove the (B) requirement, making directory reclamation work approximately the same as file reclamation. The only difference between Seigo's patch and mine is that Seigo's makes an effort to remove directories intelligently... it tries to avoid removing higher level directories. My patch doesn't make a distinction but assumes that (A) will tend to hold for higher level directories: that is, that higher level directories tend to be accessed more often and thus will tend to have pages in the VM Page Cache, and thus not be candidates for reuse anyway. So my patch has a very similar effect but without the overhead. In all the testing I've done I cannot perceive any performance difference between Seigo's patch and mine, but from an algorithmic point of view mine ought to scale much, much better. Even if we adjust cache_purgeleafdirs() to run even less often, we still run up against the fact that the scanning algorithm is O(N*M) and we know from history that this can create serious breakage. People may recall that we had similar problems with the VM Pageout daemon, where under certain load conditions the pageout daemon wound up running continuously, eating enormous amounts of cpu. We lived with the problem for years because the scaling issues didn't rear their heads until machines got hefty enough to have enough pages for the algorithms to break down. People may also recall that we had similar problems with the buffer cache code.... specifically, the scan 'restart' conditions could break down algorithmically and result in massive cpu use by bufdaemon. I think cache_purgeleafdirs() had the right idea. From my experience with the VM system, however, I have to recommend that we remove it from the system and, at least initially, replace it with my simpler patch. We could extend my patch to do the same check -- that is, only remove directory vnodes at lower levels in the namei cache, simply by scanning the namei cache list at the vnode in question. So in fact it would be possible to adjust my patch to have the same effect that cache_purgeleafdirs() had, but without the scaling issue (or at least with less of an issue.. it would be O(N) rather then O(M*N)). - The bigger problem is exactly as DG has stated... it isn't the namei cache that is our enemy, it's the VM Page cache preventing vnodes from being recycled. For the moment I believe that cache_purgeleafdirs() or my patch solves the problem well enough that we can run with it for a while. The real solution, I believe, is to give us the ability to take cached VM Pages associated with a file and rename them to cached VM Pages associated with the filesystem device - we can do this only for page-aligned blocks of course, not fragments (which we would simply throw away)... but it would allow us to reclaim vnodes independant of the VM Page cache without losing the cached pages. I think this is doable but it will require a considerable amount of work. It isn't something I can do in a day. I also believe that this can dovetail quite nicely into the I/O model that we have slowly been moving towards over the last year (Poul's work). Inevitably we will have to manage device-based I/O on a page-by-page basis and being able to do it via a VM Object seems to fit the bill in my opinion. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200109230642.f8N6gPj84955>