Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Mar 2005 23:29:39 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Jeff Roberson <jroberson@chesapeake.net>
Cc:        arch@freebsd.org
Subject:   Re: Freeing vnodes.
Message-ID:  <200503150729.j2F7Tdxl052412@apollo.backplane.com>
References:  <20050314213038.V20708@mail.chesapeake.net>

next in thread | previous in thread | raw e-mail | index | archive | help
:I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff
:that allows us to start reclaiming vnodes from the free list and release
:their memory.  It also changes the semantics of wantfreevnodes, and makes
:getnewvnode() much prettier.

    Hmm.   I'm not sure your logic is correct in this bit:

    +	/*
    +	 * We try to get half way to wantfreevnodes each time we run.  This 
    +	 * slows down the process slightly, giving vnodes a greater chance
    +	 * of being lru'd to the back of the list.
    +	 */
    +	count = (freevnodes - wantfreevnodes) / 2;
    +	for (; count > 0; count--) {
	    ...

    This seems to be indicating that you are going to try to destroy
    *MOST* of the vnodes on the free list.  freevnodes is typically
    a fairly high number what wantfreevnodes is typically a very low
    number, so this calculation is typically going to be, well, a big
    number.

    I think you did not intend this.  Didn't you just want to destroy
    enough vnodes to have 'wantfreevnodes' worth of slop so getnewvnode()
    could allocate new vnodes?  In that case the calculation would be:

	count = numvnodes - desiredvnodes + wantfreevnodes;

	while (count > 0) {
		...
		physically free a vnode and reduce the numvnodes count
		...
		--count;
	}

:This also removes the recycling from the getnewvnode() path.  Instead, it
:is done by a new helper function that is called from vnlru_proc().  This
:function just frees vnodes from the head of the list until we reach our
:wantfreevnodes target.
:
:I haven't perf tested this yet, but I have a box that is doing a
:buildworld with a fairly constant freevnodes count which shows that vnodes
:are actually being uma_zfree'd.
:
:Comments?  Anyone willing to do some perf tests for me?
:
:Thanks,
:Jeff

    There is a second issue when you remove the recycling from the 
    getnewvnode() path.  Generally speaking when the system runs out of
    vnodes you wind up with a LOT of processes sleeping in the getnewvnode()
    procedure all at once.   You need to make very sure that you are actually
    freeing up a sufficient number of vnodes to allow *all* the processes
    waiting for a new vnode to proceed all at once.  This would argue
    against only going 'half way' to the goal, and would also argue for
    having a more dynamic goal that is based on the load on getnewvnode().

    Even worse, since processes open and close files all the time, there
    can be a HUGE load on getnewvnode() (think of cvsupd and find, or
    a cvs update, etc...).  This load can easily outstrip vnlru_proc()'s
    new ability to free vnodes and potentially cause a lot of unnecessarily
    blockages.

    I love the idea of being able to free vnodes in vnlru_proc() rather
    then free-and-reuse them in allocvnode(), but I cannot figure out how
    vnlru_proc() could possibly adapt to the huge load range that
    getnewvnode() has to deal with.  Plus keep in mind that the vnodes
    being reused at that point are basically already dead except for 
    the vgonel().

    This brings up the true crux of the problem, where the true overhead
    of reusing a vnode inline with the getnewvnode() call is... and that
    is that vgonel() potentially has to update the related inode and could
    cause an unrelated process to block inside getnewvnode().  But even
    this case is a hard sell because the buffer cache for the filesystem
    almost certainly has cached the inode's disk block and a stat() alone
    does not actually dirty an inode, you usually have to read() from the
    file too.  I would argue that if anything needs fixing here it would
    be the 'late' inode sync.  [note: inode syncing can occur in INACTIVE
    as well as RECLAIM, and I'm not sure which routine it occurs in
    'most of the time'].

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200503150729.j2F7Tdxl052412>