Date: Tue, 10 Apr 2001 16:21:25 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Rik van Riel <riel@conectiva.com.br> Cc: David Xu <bsddiy@21cn.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: vm balance Message-ID: <200104102321.f3ANLP994366@earth.backplane.com> References: <Pine.LNX.4.21.0104101833210.25737-100000@imladris.rielhome.conectiva>
next in thread | previous in thread | raw e-mail | index | archive | help
:In the balancing part, definately. FreeBSD seems to be the only
:system that has the balancing right. I'm planning on integrating
:some of the balancing tactics into Linux for the 2.5 kernel, but
:I'm not sure how to integrate the inode and dentry cache into the
:balancing scheme ...
:I'm curious about the other things though ... FreeBSD still seems
:to have the early 90's abstraction layer from Mach and the vnode
:cache doesn't seem to grow and shrink dynamically (which can be a
:big win for systems with lots of metadata activity).
:
:So while it's true that FreeBSD's VM balancing seems to be the
:best one out there, I'm not quite sure about the rest of the VM...
:
:regards,
:
:Rik
    Well, the approach we take is that of a two-layered cache.
    The vnode, dentry (namei for FreeBSD), and inode caches
    in FreeBSD are essentially throw-away caches of data
    represented in an internal form.  The VM PAGE cache 'backs'
    these caches loosely by caching the physical on-disk representation
    of inodes, and directory entries (see note 1 at bottom).
    This means that even though we limit the number of the namei
    and inode structures we keep around in the kernel, the data
    required to reconstitute those structures is 'likely' to
    still be in the VM PAGE cache, allowing us to pretty much
    throw away those structures on a whim.  The only cost is that
    we have to go through a filesystem op (possibly not requiring I/O)
    to reconstitute the internal structure.
    For example, take the namei cache.  The namei cache allows
    the kernel to bypass big pieces of the filesystem when doing
    path name lookups.  If a path is not in the namei cache the
    filesystem has to do a directory lookup.  But a directory
    lookup could very well access pages in the VM PAGE cache
    and thus still not actually result in a disk I/O.
    The inode cache works the same way ... inodes can be thrown
    away at any time and most of the time they can be reconstituted
    from the VM PAGE cache without an I/O.
    The vnode cache works slightly differently.  VNodes that are
    not in active use can be thrown away and reconstituted at a later
    time from either the inode cache or the VM PAGE cache
    (or if not then require a disk I/O to get at the stat information).
    There is a caviat for the vnode cache, however.  VNodes are tightly
    integrated with VM Objects which in turn help place hold VM pages
    in the VM PAGE cache.  Thus when you throw away an inactive vnode
    you also have to throw away any cached VM PAGES representing the
    cached file or directory data represented by that vnode.
    Nearly all installations of FreeBSD run out of physical memory long
    before they run out of vnodes, so this side effect is almost never
    an issue.  On some extremely rare occassions it is possible that
    the system will have plenty of free memory but hit its vnode cache
    limit and start recycling vnodes, causing it to recycle cache pages
    even when there is plenty of free memory available.  But this is
    very rare.
    The key point to all of this is that we put most of our marbles in
    the VM PAGE cache.  The namei and inode caches are there simply for
    convenience so we don't have to 'lock' big portions of the underlying
    VM PAGE cache.
    The VM PAGE cache is pretty much an independant entity.  It does not know
    or care *what* is being cached, it only cares how often the data is 
    being accessed and whether it is clean or dirty.  It treats all the
    data nearly the same.
    note (1):  Physical directory blocks have historically been cached in
    the buffer cache, using kernel MALLOC space, not in the VM PAGE cache.
    buffer-cache based MALLOC space is severely limited (only a few megabytes)
    compared to what the VM PAGE cache can offer.  In FreeBSD a
    'sysctl -w vfs.vmiodirenable=1' will cause physical directory blocks to
    be cached in the VM PAGE Cache, just like files are cached.  This is
    not the default but it will be soon, and many people already turn this
    sysctl on.
    -
    I should also say that there is a *forth* cache not yet mentioned which
    actually has a huge effect on the VM PAGE cache.  This fourth cache 
    relates to pages *actively* mapped into user space.  A page mapped into
    user space is wired (cannot be ripped out of the VM PAGE cache) and also
    has various other pmap-related tracking structures (which you are familiar
    with, Rik, so I won't expound on that too much).  If the VM PAGE cache
    wants to get rid of an idle page that is still mapped to a user process,
    it has to unwire it first which means it has to get rid of the user
    mappings - a pmap*() call from vm/vm_pageout.c and vm/vm_page.c 
    accomplishes this.  This fourth cache (the active user mappings of pages)
    is also a throw away cache, though one with the side effect of making
    VM PAGE cache pages available for loading into user process's memory maps.
						-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104102321.f3ANLP994366>
