From owner-freebsd-current  Wed Jul  7 17: 1:18 1999
Delivered-To: freebsd-current@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id BC14D1510E; Wed,  7 Jul 1999 17:01:05 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id RAA95104;
	Wed, 7 Jul 1999 17:01:02 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 7 Jul 1999 17:01:02 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199907080001.RAA95104@apollo.backplane.com>
To: David Greenman <dg@root.com>
Cc: freebsd-hackers@FreeBSD.ORG, freebsd-current@FreeBSD.ORG
Subject: Re: Heh heh, humorous lockup 
References:  <199907072334.QAA23809@implode.root.com>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:   We've been here before, a couple of times. This started to become an issue
:when the limits were removed and has gotten worse as the vnode and fsnode
:structs have grown over time. We're running into some limits on how much
:space we can give to the kernel since there are a number of folks which
:think that 3GB of process VA space is a minimum. I tend to think that the
:2GB/2GB split that I use on wcarchive is probably more appropriate as a
:default, but like I say, others disagree.
:
:-DG
:
:David Greenman

    If we added the capability to the buffer cache to delete B_DELWRI B_VMIO
    buffers (leaving dirty VM pages behind), we could reduce the size of
    the filesystem buffer cache considerably while at the same time improve
    our ability to cache dirty data - assuming all the other problems related
    to doing that sort of thing get fixed, that is.  I am heading this way
    already as are others -- the filesystem buffer cache really needs to be
    relegated to handling active I/O and filesystem mappings, not holding 
    onto dirty data for dear life.

    This would require keeping track of most dirty pages, which isn't too 
    hard to do - we split the vm_object page list into a clean and a dirty
    list, and we keep the notion of clean and dirty vnodes so the update
    daemon doesn't change.

    If we can reduce the size of the filesystem buffer cache to something
    reasonable, more KVA space will be available for other kernel things.

    --

    The biggest stumbling block to doing this is the reconstitution overhead
    of the buffer cache, as demonstrated by this simple test.  As you can
    see by this test, the cost of reconstituting a filesystem buffer on a
    pentium-Pro 200 is roughly equivalent to  27 MBytes/sec worth of 
    bandwidth.

    Create a big file:
	dd if=/dev/zero of=test bs=32k count=4096

    DD back in (several times) a block big enough to fit in the VM page cache
    but not big enough to fit into the filesystem buffer cache.  No actual
    disk I/O occurs:

	dd if=test of=/dev/null bs=32k count=256
	8388608 bytes transferred in 0.146539 secs (57244848 bytes/sec)

    DD back in (several times) a block big enough to fit in the VM page cache
    *AND* the filesystem buffer cache.  No actual disk I/O occurs:

	apollo:/usr/obj# dd if=test of=/dev/null bs=32k count=64
	2097152 bytes transferred in 0.024780 secs (84630712 bytes/sec)


					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message