From owner-freebsd-arch@FreeBSD.ORG Fri Jun 20 02:30:22 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AF38D37B401; Fri, 20 Jun 2003 02:30:22 -0700 (PDT) Received: from HAL9000.homeunix.com (ip114.bella-vista.sfo.interquest.net [66.199.86.114]) by mx1.FreeBSD.org (Postfix) with ESMTP id CED1243F3F; Fri, 20 Jun 2003 02:30:21 -0700 (PDT) (envelope-from das@freebsd.org) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.9) with ESMTP id h5K9UCJa087120; Fri, 20 Jun 2003 02:30:12 -0700 (PDT) (envelope-from das@freebsd.org) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.9/Submit) id h5K9U4Iw087119; Fri, 20 Jun 2003 02:30:04 -0700 (PDT) (envelope-from das@freebsd.org) Date: Fri, 20 Jun 2003 02:30:04 -0700 From: David Schultz To: Terry Lambert Message-ID: <20030620093004.GA86924@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , Poul-Henning Kamp , Dmitry Sivachenko , "Tim J. Robbins" , arch@freebsd.org References: <20030618112226.GA42606@fling-wing.demos.su> <39081.1055937209@critter.freebsd.dk> <20030619113457.GA80739@HAL9000.homeunix.com> <3EF2969F.4EE7D6D4@mindspring.com> <20030620061010.GA85747@HAL9000.homeunix.com> <3EF2C67D.65F8A635@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EF2C67D.65F8A635@mindspring.com> cc: Dmitry Sivachenko cc: Poul-Henning Kamp cc: arch@freebsd.org Subject: Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2003 09:30:23 -0000 On Fri, Jun 20, 2003, Terry Lambert wrote: > David Schultz wrote: > > Yes, and my point was that it's important to maintain the > > separation, at least implicitly, in any new design. I think this > > point was obvious to the people concerned before I even mentioned > > it, so there's no need to rehash it, but the designers of certain > > other operating systems seem to have missed it. > > Well, Solaris "reinvented" the seperate VM and buffer cache > in Solaris 2.8. 8-(. I wasn't sure what you were recommending > from what you said. Let me make it clear that I'm not advocating the Solaris 8 approach. But it would seem that the FS metadata cache needs to be insulated from the VM cache better than priority paging can provide. Perhaps it would be possible to enforce a sort of self-tuning version of separate VM and buffer caches, where the buffer cache has a carefully managed RSS that can scale based on both FS activity and memory pressure. That way, I/O-intensive workloads will not be allowed to suck too many pages away from user processes and the VM system will be able to better estimate actual memory pressure. > > The main problem isn't metastability or the lack of deadlock > > detection, it's that some workloads reasonably require more > > dependency tracking than the buffer cache can accomodate. At > > present, we can't track more than about 50 directories in the > > buffer cache. > > I don't know if I buy this directly. It's probably possible > to commit an incomplete tree, as long as it's complete from > the root, at any subtree point. Doing this, though, you would > have to switch from isosynchronous to synchronus processing on > the subtree for the remainder of its duration. This works, > because you use the associative property of the tree above to > replace it with a single edge segment; other orphan subtrees > of the same tree all have to fall into the same mode. I don't understand what you're getting at here. If you don't have enough space to cache more than 50 dependencies, you lose performance when your working set exceeds 50 directories, period. Trying to address this issue by making the softupdates flushing code smarter is only working around the limitations of the present buffer cache. > What was Kirk's answer? He didn't give me one, aside from advocating backing dependencies with the VM system. This issue just came up in passing a while ago in relation to a pathological case for softupdates that resulted in an explosion of dependencies that filled up the buffer cache and caused a deadlock. ;-) (The problem has since been hacked around, BTW.) > But the quoted "50" is the ideal, when all dependent operations > occur in the same tick, given the current wheel size; all this > strategy does is up the number (the real number isn't 50, it's > unfortunately 'size - max_n - 1') by making them occur virtually > in the same tick, even if they are spread out temporally otherwise. I think 50 is merely a number that makes softupdates not fill up the buffer cache and deadlock. Keep in mind that the dependency graph could have a large fanout, or it could be a multigraph. There's no magical association between ~50 directories and the maximum path length in the graph. Again, it's the buffer cache that's the primary problem, not softupdates.