From owner-freebsd-current Mon Apr 16 15:53: 4 2001 Delivered-To: freebsd-current@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 2424B37B43F for ; Mon, 16 Apr 2001 15:53:02 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3GMqwG83808; Mon, 16 Apr 2001 15:52:58 -0700 (PDT) (envelope-from dillon) Date: Mon, 16 Apr 2001 15:52:58 -0700 (PDT) From: Matt Dillon Message-Id: <200104162252.f3GMqwG83808@earth.backplane.com> To: Bruce Evans Cc: "Justin T. Gibbs" , Doug Barton , "'current@freebsd.org'" Subject: Re: FW: Filesystem gets a huge performance boost References: Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> It just seems inelegant to have a system that, on paper, is :> so inefficient. Can't we do better? : :Sure. Don't discard buffer contents when recycling a B_MALLOC'ed buffer, :but manage it using a secondary buffer cache that doesn't have as much :overhead as the primary one (in particular, don't reserve BKVASIZE bytes :of kernel virtual address space for each secondary buffer). This would :be even more inelegant, and more complicated, but not so inefficient. : :Bruce Well, I think the last few years have proven that B_MALLOC buffers are essentially unmanageable. Even if you were to come up with the perfect algorithm, KVM just doesn't scale to physical memory the way it should. Only physical memory scales to physical memory, and that means the VM Page cache. We could conceivably use the VM object representing the filesystem block device, which normally only holds cylinder group bitmaps and inodes, and use it to back piecemeal buffer cache mappings for directories (at least as long as we do not allow mmap()ing of directories, which would make this impossible). The backing pages would still be 4K, and we would have to be extremely careful in regards to the valid and dirty bits in the vm_page_t so as not to infringe on adjacent file fragments (which could be mmap'd), but now the 4K of backing store would be able to cache up to 8 small directories that happen to reside in the same filesystem block. The above would be an extremely complex solution and I wouldn't want to implement it for that reason. A separately managed buffer cache is also a complex solution because in order to be effective it needs to be scaleable (as the current B_MALLOC is not). Even though the potential wasteage with the current solution seems high, the actual impact on the system is low. I have yet to see any detrimental results in my own testing. Anyone can test -- simply turn on the vmiodirenable sysctl and have at it! -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message