From owner-freebsd-current@FreeBSD.ORG  Wed Sep  1 18:19:28 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DC1C616A4D0
	for <freebsd-current@freebsd.org>;
	Wed,  1 Sep 2004 18:19:28 +0000 (GMT)
Received: from ganymede.hub.org (blk-222-46-91.eastlink.ca [24.222.46.91])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 17A8443D5F
	for <freebsd-current@freebsd.org>;
	Wed,  1 Sep 2004 18:19:28 +0000 (GMT)	(envelope-from scrappy@hub.org)
Received: by ganymede.hub.org (Postfix, from userid 1000)
	id E3ADA33E31; Wed,  1 Sep 2004 15:19:27 -0300 (ADT)
Received: from localhost (localhost [127.0.0.1])
	by ganymede.hub.org (Postfix) with ESMTP id DBA5833DCE
	for <freebsd-current@freebsd.org>;
	Wed,  1 Sep 2004 15:19:27 -0300 (ADT)
Date: Wed, 1 Sep 2004 15:19:27 -0300 (ADT)
From: "Marc G. Fournier" <scrappy@hub.org>
To: freebsd-current@freebsd.org
Message-ID: <20040901151405.G47186@ganymede.hub.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Subject: vnode leak in FFS code ... ?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Sep 2004 18:19:29 -0000


I don't know if this is applicable to -current as well, but so far, 
anything like this I've uncovered in 4.x has needed an equivalent fix in 
5.x, so figured it can't hurt to ask, especially with everyone working 
towards a STABLE 5.x branch ... I do not have a 5.x machine running this 
sort of load at the moment, so can't test, or provide feedback there ... 
all my 5.x machines are more or less desktops ...

On Saturday, I'm going to try an unmount of the bigger file system, to see 
if it frees everything up without a reboot ... but if someone can suggest 
something to check to see if it is a) a leak and b) is fixable between now 
and then, please let me know ... again, this is a 4.10 system, but most of 
the work that Tor and David have done (re: vnodes) in the past relating to 
my servers have been applied to 5.x first, and MFC'd afterwards, so I 
suspect that this too many be something that applies to both branches ...

-----------------

I have two servers, both running 4.10 of within a few days (Aug 5 for 
venus, Aug 7 for neptune) ... both running jail environments ... one with 
~60 running, the other with ~80 ... the one with 60 has been running for 
~25 days now, and is at the border of running out of vnodes:

Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: 11058 - debug.vnlru_nowhere: 256463 - vlrup
Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: 13155 - debug.vnlru_nowhere: 256482 - vlrup
Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: 13092 - debug.vnlru_nowhere: 256482 - vlruwt

while the other one has been up for ~1 days, but is using alot less, for 
more processes:

Aug 31 20:58:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: 208655 - debug.vnlru_nowhere: 0 - vlruwt
Aug 31 20:59:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: 208602 - debug.vnlru_nowhere: 0 - vlruwt
Aug 31 21:00:03 neptune root: debug.numvnodes: 344062 - debug.freevnodes: 208319 - debug.vnlru_nowhere: 0 - vlruwt

I've tried shutting down all of the VMs on venus, and umount'd all of the 
unionfs mounts, as well as the one nfs mount we have ... the above #s are 
after the VMs (and mounts are recreated ...

Now, my understanding of the vnodes is that for every file opened, a vnode 
is created ... in my case, since I'm using unionfs, there are two vnodes 
per file ... if it possible that there are 'stale' vnodes that aren't 
being freed up?  Is there some way of 'viewing' the vnode structure?

For instance, fstat shows:

venus# fstat | wc -l
     19531

So, obviously it isn't just open files that I'm dealing with here, for 
even if I double that, that is nowhere near 519920 ...

So, where else are the vnodes going?  Is there a 'leak'?  What can I look 
at to try and narrow this down / provide more information?

Even some way of determining a specific process that is sucking back alot 
of them, to move that to a different machine ... ?

Looking at vmstat -m .. specifically the work that David did on seperating 
the union vs regular vnodes:

    UNION mount    60     2K      3K204800K      162    0     0  32
         undcac     0     0K      1K204800K343638713    0     0  16
         unpath 13146   227K   1025K204800K 43541149    0     0  16,32,64,128
    Export Host     1     1K      1K204800K      164    0     0  256
         vnodes   141     7K      8K204800K      613    0     0  16,32,64,128,256

Why does 'vnodes' show only 141 InUse?  Or, in this case, should I be 
looking at:

       FFS node496600124150K 127870K204800K401059293    0     0  256

496k FFS nodes, if I'm reading right?

vs neptune, which is showing only:

       FFS node300433 75109K  80257K204800K  3875307    0     0  256

Hrmmm, maybe I'm mis-reading all of this, and going down the wrong paths 
here, so hopefully someone will correct if I am ... but, for now ...

Looking at vmstat -m a bit further, the top of the report has:

Memory statistics by bucket size
Size   In Use   Free   Requests  HighWater  Couldfree
    16    13116  28356 2063580697    1280       7822
    32    77734   7002  168084205     640     316065
    64   465006  48402 2804541088     320     637084
   128   100182  60010  591859866     160    1850304
   256   500029  12163 1178322001      80     123078

Now, the only things that are using alot of the '256 Size' memory are:

       FFS node494513123629K 127870K204800K401104542    0     0  256
       vfscache449709 29178K  32434K204800K737673766    0     0  64,128,256,512K

Since only 500029 are 'InUse', and since FFS node is exclusively 256 ... 
I'm going to guess that most of vfscache is using something else ... so, 
my question becomes if 123000 'Could be Freed', why aren't they?

Assuming, of course, I'm not on the wrong trail here :(