Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 May 1996 18:11:18 +1700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        jmb@freefall.freebsd.org (Jonathan M. Bresler)
Cc:        terry@lambert.org, davidg@Root.COM, rashid@rk.ios.com, jgreco@solaria.sol.net, hackers@FreeBSD.org
Subject:   Re: Breaking ffs - speed enhancement?
Message-ID:  <199606010111.SAA19301@phaeton.artisoft.com>
In-Reply-To: <199605312249.PAA15899@freefall.freebsd.org> from "Jonathan M. Bresler" at May 31, 96 03:49:43 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > That they are writen out during sync instead of being LRU'ed out is
> > *the* problem.  The sync does not have a sufficiently long cycle
> > time for decent write-gathering.
> 
> 	[caveat: i am getting deeper into filesystems but have a long
> 	way to go yet.  please add a grain of salt to these comments.]
> 
> 	how is our current file system different from that used in
> 	4.2BSD?  i assume that we are now using extent based clustering
> 	similar to that described by McVoy and Kleiman in "Extent-like
> 	Performance of a Unix File System".   if so then increasing the
> 	update interval should have significant benefits in reducing
> 	the I/O soaked by inodes (metadata) updates.  would this give
> 	us a quasi-asych filesystem?  one where we could delay update
> 	to fit our tolerance for letting the metedata "hang out in
> 	the wind"?

The difference is that the VM cache has been unified.  One of the
consequences of this is that the cache locality model has changed.

Instead of a buffer cache that caches blocks from a given device,
the cached pages are hung off the vnode -- if the pages are in
fact data pages from an on disk file.

The difference is pretty subtle.  The pages for FS data which isn't
actually data pages from a file end up being in one of three states:
dirty pending write by sync or clean pending discard by sync or locked
for use (wired/pinned/whatever) and not subject to sync.  Only pages
actually representing file data have any real cache persistance...

At least that's the way I'm reading the code.  We *really* need to
have design documentation as to what is affected by whom when.

Really, for a depth-first traversal, I'm going to be hitting the
access time on directory inodes in a geometrically decreasing
frequency relative to where the traversal starts as we go higher
in the actualy FS tree.  The liklihood that I won't be able to
combine several "mark for update + update" or "update" operations
on a given on disk inode into a dsingle write increases inversely
to depth in the tree.

Probably, I want to LRU the now-"dirty" inode data to the front of
the LRU each time it is touched to delay the actual write until
the operation that I'm doing that causes multiple updates is done.

Unfortunately, the sync is going to write dirty pages, not simply
reduce the LRU contents down to a floating age marker in the LRU
("all items in the LRU older than this amount must be written").

There's also the problem of write clustering.  It's likely that
several inodes in the same cylinder group will want to be written
as one write to save on actual I/O; so clustering requirements
may want to overrride LRU requirements in some cases; I don't know
if this should take the form of lame insertion of ready-to-be-written
LRU items in cluster order (delaying them), or if it should be time
demotion in the LRU for an adjacent item.  I guess it's whether you
take LRU position as meaning "these blocks *want* to be written", or
as "these blocks *don't* want to be written".  Clearly, data integrity
*wants* them written and cache utility *doesn't* want them written.

I haven't really done experimentation on which policy is best; I
suppose it would depend on ratio of cache to disk, and frequency
of access compared to proposed LRU latency.  John Dyson is surely
more qualified to comment on this than I am.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606010111.SAA19301>