From owner-freebsd-hackers Sat Sep 14 18:24:36 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id SAA22490 for hackers-outgoing; Sat, 14 Sep 1996 18:24:36 -0700 (PDT) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id SAA22485 for ; Sat, 14 Sep 1996 18:24:30 -0700 (PDT) Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id LAA30846; Sun, 15 Sep 1996 11:19:02 +1000 Date: Sun, 15 Sep 1996 11:19:02 +1000 From: Bruce Evans Message-Id: <199609150119.LAA30846@godzilla.zeta.org.au> To: proff@suburbia.net, terry@lambert.org Subject: Re: attribute/inode caching Cc: freebsd-hackers@freebsd.org Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >> What is the present status of attribute/inode/directory caching under >> freebsd? When performing a 'du' of even a relatively small hierarachy, Little improved since 1992. The vnode cache is a bit larger, at least on machines with plenty of memory (the default is up from about 200 on an 8MB system to about 2000 on a 32MB system), and you can tweak its size using sysctl(8) or the unnecessary EXTRAVNODES option, but the caching still breaks down when the vnode cache starts to thrash. The problem used to be that buffers for directories were attached to vnodes, so thrashing of the vnode cache also thrashed the buffer cache. Also, (?) inode buffers weren't kept in the buffer cache and the buffer cache was too small to hold many inodes. Under Linux, at least in 1992, the buffer cache isn't so tightly coupled to the vnode cache, so ordinary LRU caching results in the buffer cache filling up with inode data, so it can easily cache 6000 128-byte inodes (or 20000 32-byte inodes) and associated directory entries in only 1MB of buffer cache. This might not be the best use for the buffer cache, but it is good for traversing large hierarchies. I don't know exactly how the unified vm and buffer cache has affected this. Apparently, not much. >> the second 'du' appears no faster than the first and the drive can be >> heard to thrash around in exactly the same manner. I notice this mainly when I run `find' on relatively large heirachies. The problem is not so much that the second traversal reads everything again, but that the first traversal thrashes the buffer and/or vm cache. >POSIX mandates that the access time will be marked for update when you >read the directory; thus it's written out, and the thrashing is expected. Wrong. Neither marking for update nor updating the access times requires writing anything. In FreeBSD, writing is a side affect of thrashing the caches and updating is often a side effect of writing. First, when the vnode cache thrashes, the vnodes have to be updated and written to the buffer cache. Second, when the buffer cache thrashes, the dirty buffers containing the vnodes have to be written out. They are usually written with delayed writes, so the writes need not more than double the overhead for the thrashing (probably much worse in practice because of seeks). >One issue which is a big one in my book is that only data hung off a vnode >is cached in the buffer cache. The caching is by inode/extent rather than >by device/extent. Yes, this is the main problem. >The net result of this will be that the inode data itself will not be >cached. It could be hung off the vnode for the mounted device. I'm not sure if it isn't already. This problem is secondary. Repeated tree traversals aren't all that common, and you don't really want them to eat the buffer cache (you probably want to buffer precisely the inodes and directories that will be hit again a long time later in the same search, e.g., intermediate directories for a depth-first seach). >There is a "second chance" ihash cache in FFS; other FS's are not so >lucky; thus your performance will depend on number of elements before >the hash overflows and whether or not you are testing FFS or some other >FS. For instance, expect EXT2FS to have significantly worse performance >under BSD. Actually, ext2fs uses the ufs ihash. Hmm, EXTRAVNODES is necessary after all, since the ihash table isn't affected by the sysctl to change `desiredvnodes'. It's fishy that the ufs table size is the same as the vfs table size. Bruce