From owner-freebsd-hackers Tue Apr 10 16:21:45 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id CAF7137B424 for ; Tue, 10 Apr 2001 16:21:41 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3ANLP994366; Tue, 10 Apr 2001 16:21:25 -0700 (PDT) (envelope-from dillon) Date: Tue, 10 Apr 2001 16:21:25 -0700 (PDT) From: Matt Dillon Message-Id: <200104102321.f3ANLP994366@earth.backplane.com> To: Rik van Riel Cc: David Xu , freebsd-hackers@FreeBSD.ORG Subject: Re: vm balance References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :In the balancing part, definately. FreeBSD seems to be the only :system that has the balancing right. I'm planning on integrating :some of the balancing tactics into Linux for the 2.5 kernel, but :I'm not sure how to integrate the inode and dentry cache into the :balancing scheme ... :I'm curious about the other things though ... FreeBSD still seems :to have the early 90's abstraction layer from Mach and the vnode :cache doesn't seem to grow and shrink dynamically (which can be a :big win for systems with lots of metadata activity). : :So while it's true that FreeBSD's VM balancing seems to be the :best one out there, I'm not quite sure about the rest of the VM... : :regards, : :Rik Well, the approach we take is that of a two-layered cache. The vnode, dentry (namei for FreeBSD), and inode caches in FreeBSD are essentially throw-away caches of data represented in an internal form. The VM PAGE cache 'backs' these caches loosely by caching the physical on-disk representation of inodes, and directory entries (see note 1 at bottom). This means that even though we limit the number of the namei and inode structures we keep around in the kernel, the data required to reconstitute those structures is 'likely' to still be in the VM PAGE cache, allowing us to pretty much throw away those structures on a whim. The only cost is that we have to go through a filesystem op (possibly not requiring I/O) to reconstitute the internal structure. For example, take the namei cache. The namei cache allows the kernel to bypass big pieces of the filesystem when doing path name lookups. If a path is not in the namei cache the filesystem has to do a directory lookup. But a directory lookup could very well access pages in the VM PAGE cache and thus still not actually result in a disk I/O. The inode cache works the same way ... inodes can be thrown away at any time and most of the time they can be reconstituted from the VM PAGE cache without an I/O. The vnode cache works slightly differently. VNodes that are not in active use can be thrown away and reconstituted at a later time from either the inode cache or the VM PAGE cache (or if not then require a disk I/O to get at the stat information). There is a caviat for the vnode cache, however. VNodes are tightly integrated with VM Objects which in turn help place hold VM pages in the VM PAGE cache. Thus when you throw away an inactive vnode you also have to throw away any cached VM PAGES representing the cached file or directory data represented by that vnode. Nearly all installations of FreeBSD run out of physical memory long before they run out of vnodes, so this side effect is almost never an issue. On some extremely rare occassions it is possible that the system will have plenty of free memory but hit its vnode cache limit and start recycling vnodes, causing it to recycle cache pages even when there is plenty of free memory available. But this is very rare. The key point to all of this is that we put most of our marbles in the VM PAGE cache. The namei and inode caches are there simply for convenience so we don't have to 'lock' big portions of the underlying VM PAGE cache. The VM PAGE cache is pretty much an independant entity. It does not know or care *what* is being cached, it only cares how often the data is being accessed and whether it is clean or dirty. It treats all the data nearly the same. note (1): Physical directory blocks have historically been cached in the buffer cache, using kernel MALLOC space, not in the VM PAGE cache. buffer-cache based MALLOC space is severely limited (only a few megabytes) compared to what the VM PAGE cache can offer. In FreeBSD a 'sysctl -w vfs.vmiodirenable=1' will cause physical directory blocks to be cached in the VM PAGE Cache, just like files are cached. This is not the default but it will be soon, and many people already turn this sysctl on. - I should also say that there is a *forth* cache not yet mentioned which actually has a huge effect on the VM PAGE cache. This fourth cache relates to pages *actively* mapped into user space. A page mapped into user space is wired (cannot be ripped out of the VM PAGE cache) and also has various other pmap-related tracking structures (which you are familiar with, Rik, so I won't expound on that too much). If the VM PAGE cache wants to get rid of an idle page that is still mapped to a user process, it has to unwire it first which means it has to get rid of the user mappings - a pmap*() call from vm/vm_pageout.c and vm/vm_page.c accomplishes this. This fourth cache (the active user mappings of pages) is also a throw away cache, though one with the side effect of making VM PAGE cache pages available for loading into user process's memory maps. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message