From owner-freebsd-current Wed Jul 7 17: 1:18 1999 Delivered-To: freebsd-current@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id BC14D1510E; Wed, 7 Jul 1999 17:01:05 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id RAA95104; Wed, 7 Jul 1999 17:01:02 -0700 (PDT) (envelope-from dillon) Date: Wed, 7 Jul 1999 17:01:02 -0700 (PDT) From: Matthew Dillon Message-Id: <199907080001.RAA95104@apollo.backplane.com> To: David Greenman Cc: freebsd-hackers@FreeBSD.ORG, freebsd-current@FreeBSD.ORG Subject: Re: Heh heh, humorous lockup References: <199907072334.QAA23809@implode.root.com> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : We've been here before, a couple of times. This started to become an issue :when the limits were removed and has gotten worse as the vnode and fsnode :structs have grown over time. We're running into some limits on how much :space we can give to the kernel since there are a number of folks which :think that 3GB of process VA space is a minimum. I tend to think that the :2GB/2GB split that I use on wcarchive is probably more appropriate as a :default, but like I say, others disagree. : :-DG : :David Greenman If we added the capability to the buffer cache to delete B_DELWRI B_VMIO buffers (leaving dirty VM pages behind), we could reduce the size of the filesystem buffer cache considerably while at the same time improve our ability to cache dirty data - assuming all the other problems related to doing that sort of thing get fixed, that is. I am heading this way already as are others -- the filesystem buffer cache really needs to be relegated to handling active I/O and filesystem mappings, not holding onto dirty data for dear life. This would require keeping track of most dirty pages, which isn't too hard to do - we split the vm_object page list into a clean and a dirty list, and we keep the notion of clean and dirty vnodes so the update daemon doesn't change. If we can reduce the size of the filesystem buffer cache to something reasonable, more KVA space will be available for other kernel things. -- The biggest stumbling block to doing this is the reconstitution overhead of the buffer cache, as demonstrated by this simple test. As you can see by this test, the cost of reconstituting a filesystem buffer on a pentium-Pro 200 is roughly equivalent to 27 MBytes/sec worth of bandwidth. Create a big file: dd if=/dev/zero of=test bs=32k count=4096 DD back in (several times) a block big enough to fit in the VM page cache but not big enough to fit into the filesystem buffer cache. No actual disk I/O occurs: dd if=test of=/dev/null bs=32k count=256 8388608 bytes transferred in 0.146539 secs (57244848 bytes/sec) DD back in (several times) a block big enough to fit in the VM page cache *AND* the filesystem buffer cache. No actual disk I/O occurs: apollo:/usr/obj# dd if=test of=/dev/null bs=32k count=64 2097152 bytes transferred in 0.024780 secs (84630712 bytes/sec) -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message