From owner-freebsd-arch@FreeBSD.ORG Thu Jun 19 23:10:14 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D481937B401; Thu, 19 Jun 2003 23:10:14 -0700 (PDT) Received: from HAL9000.homeunix.com (ip114.bella-vista.sfo.interquest.net [66.199.86.114]) by mx1.FreeBSD.org (Postfix) with ESMTP id 00BC243FBD; Thu, 19 Jun 2003 23:10:14 -0700 (PDT) (envelope-from das@freebsd.org) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.9) with ESMTP id h5K6ABJa085977; Thu, 19 Jun 2003 23:10:11 -0700 (PDT) (envelope-from das@freebsd.org) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.9/Submit) id h5K6AAQr085976; Thu, 19 Jun 2003 23:10:10 -0700 (PDT) (envelope-from das@freebsd.org) Date: Thu, 19 Jun 2003 23:10:10 -0700 From: David Schultz To: Terry Lambert Message-ID: <20030620061010.GA85747@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , Poul-Henning Kamp , Dmitry Sivachenko , "Tim J. Robbins" , arch@freebsd.org References: <20030618112226.GA42606@fling-wing.demos.su> <39081.1055937209@critter.freebsd.dk> <20030619113457.GA80739@HAL9000.homeunix.com> <3EF2969F.4EE7D6D4@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EF2969F.4EE7D6D4@mindspring.com> cc: Dmitry Sivachenko cc: Poul-Henning Kamp cc: arch@freebsd.org Subject: Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2003 06:10:15 -0000 On Thu, Jun 19, 2003, Terry Lambert wrote: > David Schultz wrote: > > As a side note, I also think it's important that the new > > implementation have a clean separation between user data and FS > > metadata, so that they are not in direct competition with each > > other for memory. > > This was the rationale behind the original VM and buffer cache > separation. Instead of coming from a limited system resource > shared between the two, they came from a limited system resource > shared between the two, and scavanged pages from each other and > caused thrashing. This was especially obvious in programs that > mmap'ed a lot of file data into memory (e.g. "ld"), and then by > seeking around, thrashed all the code pages out of core. Yes, and my point was that it's important to maintain the separation, at least implicitly, in any new design. I think this point was obvious to the people concerned before I even mentioned it, so there's no need to rehash it, but the designers of certain other operating systems seem to have missed it. > > The present buffer cache may be too limited for > > the massive number of dependencies softupdates needs to track for > > FS-intensive loads, but we also don't want lots of accumulated dirty > > buffers from heavy FS activity to force application data out of memory. > > This basically says that you need to stall dependency memory > allocation at a high watermark, and force the update clock to > tick until the problem is eliminated. The acceleration of the > update clock that takes place today is insufficient for this: > you need to force the tick, wait for the completion, and force > the next tick, etc., until you get back to your low water mark. > If you just accelerate the clock, the hysteresis will keep you > in a constant state of thrashing. Last year I was saying something similar to what you just said, before Kirk convinced me that I was wrong. ;-) The main problem isn't metastability or the lack of deadlock detection, it's that some workloads reasonably require more dependency tracking than the buffer cache can accomodate. At present, we can't track more than about 50 directories in the buffer cache. Still, the opposite problem of allowing the accumulation of many dependencies that have to be written anyway concerns me. I guess that's where a clever flushing algorithm comes in. [1] points out that Solaris 2.6 and 7 had a clever balancing algorithm between the FS and VM caches, too, but that wound up being tossed out in favor of a separate FS metadata cache in Solaris 8. But Solaris doesn't do softupdates, so it doesn't have a tradeoff between memory pressure and effective dependency tracking. So I don't know what the right answer is for FreeBSD. > > The original buffer cache design is untenable largely because > > Dyson wanted to maintain compatibility with existing FS > > interfaces. > > At the time, the problem was that the vmobject_t's were not > reference counted, and allowed to be aliased. [...] You're describing a separate problem from the one I'm thinking of, but probably also a valid one. My knowledge of BSD doesn't extend back that far. [1] Mauro and McDougall. Solaris Internals: Core Kernel Architecture, Prentice Hall (2001).