Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Aug 1996 10:33:44 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        michaelh@cet.co.jp
Cc:        terry@lambert.org, dfr@render.com, jkh@time.cdrom.com, tony@fit.qut.edu.au, freebsd-fs@freebsd.org
Subject:   Re: Per fs vnode pools (was Re: NFS Diskless Dispare...)
Message-ID:  <199608081733.KAA17340@phaeton.artisoft.com>
In-Reply-To: <Pine.SV4.3.93.960808140146.11801B-100000@parkplace.cet.co.jp> from "Michael Hancock" at Aug 8, 96 02:19:30 pm

next in thread | previous in thread | raw e-mail | index | archive | help

> On Tue, 6 Aug 1996, Michael wrote:
> 
> > > In effect, the ihash would become a vnhash and LRU for use in
> > > reclaiming vnode/inode pairs.  This would be much more efficient
> > > than the current dual allocation sequence.
> 
> Would you want this to be LRU vnodes with no buffer pages first?

Yes.  Minimally, you'd want a dual insertion point for LRU pages:

head | vnodes without buffer pages | vnodes with buffer pages | tail
insertion points               ---^                       ---^

> The
> buffer cache is being reclaimed, with some kind of algorithm, independent
> of the vnodes.  You want to keep the vnodes with data still hanging off of
> them in the fs pool longer.

Actually, you want to be able to impose a working set quota on a per
vnode basis using the cache reclaim algorithm.  This avoids large mmap's
from thrashing the cache.  You could have supervisor, or even user,
overrides for the behaviour.

head | buffer reclaimation list | tail
      ^                        ^--- insert here if vnode buffer count
      |                             is below working set quota
      insert here if vnode buffer count equals
      working set quota

So truly, it does not want to be independent of the vnodes.

A vnode quota is better than a process quota, since a process can use
vnodes in common with other processes; you don't want to have a process
with a low working set quota able to interfere with locality for another
otherwise unrelated process.


> BTW, is the incore inode table fixed or dynamic?

Currently dynamic in FFS, and FS implementation dependent in principle.
Potentially you will want to be able to install soft usage limits via
mount options, independent of FS, assuming a common subsystem is being
used to implement the allocation and LRU maintenance for each FS.  This
would imply a need to be able to force a reclaim, or allocation balancing
at a minimum, in low memory situations.

This is actually a consequence of the buffer cache information not
being indexed by device/offset for data which is not referred by
vnode: inode information, etc..  If I had my preferences, the cache
would be indexible by dev/offset as well (I would *not* eliminate the
vnode/offset indexing currently present, since it avoids a bmap on
every call that deals with file data).  One major win here is that
getting one on disk inode vs. another on disk inode in the same
directory has a high probability of locality (the FFS paper makes
this clear when looking at the directory/inode/cylinder group allocation
policy).  Instead of copying to an in core inode buffer, the on disk
inode could be a page ref to the page containing the inode data, and
a pointer.  This would save all of the internal copies required for
stat and other operations.  Since multiple inodes could be in a device
mapped page (as opposed to a strict vnode mapping), this could save
a significant amount of I/O (16 disk inodes @ 128 bytes each per page).

I'd like to keep the table dynamic in a modified slab basis: using a
power of two allocation-ahead; this is open to discussion.  John Dyson,
in particular, has some interesting VM plans that would bear directly
on how you'd want to do this.

Clearly, if you had page mapping for the device for the on disk inode
data, the allocated in core object would be the vnode, in core inode
data (local FS state for an inode that is referenced), and a pointer
to the page containing the disk inode data (with an implied page ref, and
an implied limit of one page on the in core data -- you could overcome
this by adding more page references in a table to the in core inode
data and handling the inode dereference in the FS: you have to do that
anyway, since the reference is implict, not explicit).  A direct
implication of this is that buffer reclaim for non-vnode unreferenced
pages would need to be handled seperately; this is only a minor
complication... you could do this by tracking number of items on a
FS independent per device managed global LRU list vs. the number of
items in the FS LRU's and establishing a high water mark for free
pages so that the reclaim will occur on deallocation that pushes the
LRU above the high water mark.  Then you reclaim pages down to the
low water mark (the page just freed, being below the low water mark,
is left on the list to ensure locality).


FS mechanics are one of the funnest things you can discuss.  8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199608081733.KAA17340>